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About This Book 


This manual describes the central processing unit of the DSP56800 Family in detail. It is intended to be 
used with the appropriate DSP56800 Family member user’s manual, which describes the central 
processing unit, programming models, and details of the instruction set. The appropriate DSP56800 
Family member technical data sheet provides timing, pinout, and packaging descriptions. 


This manual provides practical information to help the user accomplish the following: 
e Understand the operation and instruction set of the DSP56800 Family 
e Write code for DSP algorithms 
¢ Write code for general control tasks 
e Write code for communication routines 


¢ Write code for data manipulation algorithms 


Audience 


The information in this manual is intended to assist design and software engineers with integrating a 
DSP56800 Family device into a design and with developing application software. 


Organization 


Information in this manual is organized into chapters by topic. The contents of the chapters are as follows: 


Chapter 1, “Introduction.” This section introduces the DSP56800 core architecture and its application. It 
also provides the novice with a brief overview of digital signal processing. 


Chapter 2, “‘Core Architecture Overview.” The DSP56800 core architecture consists of the data 
arithmetic logic unit (ALU), address generation unit (AGU), program controller, bus and bit-manipulation 
unit, and a JTAG/On-Chip Emulation (OnCE™) port. This section describes each subsystem and the buses 
interconnecting the major components in the DSP56800 central processing module. 


Chapter 3, “Data Arithmetic Logic Unit.” This section describes the data ALU architecture, its 
programming model, an introduction to fractional and integer arithmetic, and a discussion of other topics 
such as unsigned and multi-precision arithmetic on the DSP56800 Family. 


Chapter 4, “‘Address Generation Unit.” This section specifically describes the AGU architecture and its 
programming model, addressing modes, and address modifiers. 


Chapter 5, “Program Controller.” This section describes in detail the program controller architecture, its 
programming model, and hardware looping. Note, however, that the different processing states of the 
DSP56800 core, including interrupt processing, are described in Chapter 7, “Interrupts and the Processing 
States.” 
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Chapter 6, “Instruction Set Introduction.” This section presents an introduction to parallel moves and a 
brief description of the syntax, instruction formats, operand and memory references, data organization, 
addressing modes, and instruction set. It also includes a summary of the instruction set, showing the 
registers and addressing modes available to each instruction. A detailed description of each instruction is 
given in Appendix A, “Instruction Set Details.” 


Chapter 7, “Interrupts and the Processing States.” This section describes five of the six processing 
states (normal, exception, reset, wait, and stop). The sixth processing state (debug) is covered more 
completely in Chapter 9, “JTAG and On-Chip Emulation (OnCE™).” 


Chapter 8, “Software Techniques.” This section teaches the advanced user techniques for more efficient 
programming of the DSP56800 Family. It includes a description of useful instruction sequences and 
macros, optimal loop and interrupt programming, topics related to the stack of the DSP56800, and other 
useful software topics. 


Chapter 9, “JTAG and On-Chip Emulation (OnCE™),” This section describes the combined 
JTAG/OnCE port and its functions. These two are integrally related, sharing the same pins for I/O, and are 
presented together in this section. 


Appendix A, “Instruction Set Details.” This section presents a detailed description of each DSP56800 
Family instruction, its use, and its effect on the processor. 


Appendix B, “DSP Benchmarks.” DSP56800 Family benchmark example programs and results are listed 
in this appendix. 


Suggested Reading 


A list of DSP-related books is included here as an aid for the engineer who is new to the field of DSP: 
Advanced Topics in Signal Processing, Jae S. Lim and Alan V. Oppenheim (Prentice-Hall: 1988). 
Applications of Digital Signal Processing, A. V. Oppenheim (Prentice-Hall: 1978). 

Digital Processing of Signals: Theory and Practice, Maurice Bellanger (John Wiley and Sons: 1984). 
Digital Signal Processing, Alan V. Oppenheim and Ronald W. Schafer (Prentice-Hall: 1975). 


Digital Signal Processing: A System Design Approach, David J. DeFatta, Joseph G. Lucas, and William S. 
Hodgkiss (John Wiley and Sons: 1988). 


Discrete-Time Signal Processing, A. V. Oppenheim and R.W. Schafer (Prentice-Hall: 1989). 
Foundations of Digital Signal Processing and Data Analysis, J. A. Cadzow (Macmillan: 1987). 
Handbook of Digital Signal Processing, D. F. Elliott (Academic Press: 1987). 

Introduction to Digital Signal Processing, John G. Proakis and Dimitris G. Manolakis (Macmillan: 1988). 
Multirate Digital Signal Processing, R. E. Crochiere and L. R. Rabiner (Prentice-Hall: 1983). 

Signal Processing Algorithms, S. Stearns and R. Davis (Prentice-Hall: 1988). 

Signal Processing Handbook, C. H. Chen (Marcel Dekker: 1988). 

Signal Processing: The Modern Approach, James V. Candy (McGraw-Hill: 1988). 


Theory and Application of Digital Signal Processing, Lawrence R. Rabiner and Bernard Gold 
(Prentice-Hall: 1975). 
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Conventions 


This document uses the following notational conventions: 


Bits within registers are always listed from most significant bit (MSB) to least significant bit (LSB). 


Bits within a register are formatted AA[n:0] when more than one bit is involved in a description. 
For purposes of description, the bits are presented as if they are contiguous within a register. 
However, this is not always the case. Refer to the programming model diagrams or to the 
programmer’s sheets to see the exact location of bits within a register. 


When a bit is described as “set,” its value is set to 1. When a bit is described as “cleared,” its value 
is set to 0. 


Memory addresses in the separate program and data memory spaces are differentiated by a 
one-letter prefix. Data memory addresses are preceded by “X:” while program memory addresses 
have a “P:” prefix. For example, “P:$0200” indicates a location in program memory. 


Hex values are indicated with a dollar sign ($) preceding the hex value, as follows: $FFFB is the X 
memory address for the Interrupt Priority Register (IPR). 


Code examples are displayed in a monospaced font, as follows: 


BFSET #S0007,X:PCC ; Configure: line 1 
; MISOO, MOSIO, SCKO for SPI master line 2 
; ~SSO as PC3 for GPIO line 3 


Definitions, Acronyms, and Abbreviations 


The following terms appear frequently in this manual: 


DSP digital signal processor 
JTAG Joint Test Action Group 
OnCE™ On-Chip Emulation 
ALU arithmetic logic unit 
AGU address generation unit 


A complete list of relevant terms is included in the Glossary at the end of this manual. 


@ MOTOROLA xxiii 


xxiv DSP56800 Family Manual Q mororora 


Chapter 1 
Introduction 


The DSP56800 Digital Signal Processors provide low cost, low power, mid-performance computing, 
combining DSP power and parallelism with MCU-like programming simplicity. The DSP56800 core is a 
general-purpose central processing unit, designed for both efficient digital signal processing and a variety 
of controller operations. 


1.1 DSP56800 Family Architecture 


The DSP56800 Family uses the DSP56800 16-bit DSP core. This core is a general-purpose central 
processing unit (CPU), designed for both efficient DSP and controller operations. Its instruction-set 
efficiency as a DSP is superior to other low-cost DSP architectures and has been designed for efficient, 
straightforward coding of controller-type tasks. 


/O Pins 


Peripherals 


External Address 
Bus 
Interface 


16-Bit DSP 


CPU Core 


JTAG I/O 


AA0012 
Figure 1-1. DSP56800-Based DSP Microcontroller Chip 


The general-purpose MCU-style instruction set, with its powerful addressing modes and bit-manipulation 
instructions, enables a user to begin writing code immediately, without having to worry about the 
complexities previously associated with DSPs. A software stack allows for unlimited interrupt and 
subroutine nesting, as well as support for structured programming techniques such as parameter passing 
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and the use of local variables. The veteran DSP programmer sees a powerful DSP instruction set with 
many different arithmetic operations and flexible single- and dual-memory moves that can occur in parallel 
with an arithmetic operation. The general-purpose nature of the instruction set also allows for an efficient 
compiler implementation. 


A variety of standard peripherals can be added around the DSP56800 core (see Figure 1-1 on page 1-1) 
such as serial ports, general-purpose timers, real-time and watchdog timers, different memory 
configurations (RAM, ROM, or both), and general-purpose I/O (GPIO) ports. 


On-Chip Emulation (OnCE™) capability is provided through a debug port conforming to the Joint Test 
Action Group (JTAG) standard. This provides real-time, embedded system debugging with on-chip 
emulation capability through the five-pin JTAG interface. A user can set hardware and software 
breakpoints, display and change registers and memory locations, and single step or step through multiple 
instructions in an application. 


The DSP56800’s efficient instruction set, multiple internal buses, on-chip program and data memories, 
external bus interface, standard peripherals, and industry-standard debug support make the DSP56800 
Family an excellent solution for real-time embedded control tasks. It is an excellent fit for wireless or 
wireline DSP applications, digital control, and controller applications in need of more processing power. 


1.1.1 Core Overview 


The DSP56800 core is a programmable 16-bit CMOS digital signal processor that consists of a 16-bit data 
arithmetic logic unit (ALU), a 16-bit address generation unit (AGU), a program decoder, On-Chip 
Emulation (OnCE), associated buses, and an instruction set. Figure 1-2 on page 1-3 shows a block diagram 
of the DSP56800 core. The main features of the DSP56800 core include the following: 


e Processing capability of up to 35 million instructions per second (MIPS) at 70 MHz 
e Requires only 2.7-3.6 V of power 

e Single-instruction cycle 16-bit x 16-bit parallel multiply-accumulator 

¢ Two 36-bit accumulators including extension bits 

¢ Single-instruction 16-bit barrel shifter 

¢ Parallel instruction set with unique DSP addressing modes 

¢ Hardware DO and REP loops 

¢ Two external interrupt request pins 

¢ Three 16-bit internal core data buses 

e Three 16-bit internal address buses 

e Instruction set that supports both DSP and controller functions 

¢ Controller-style addressing modes and instructions for smaller code size 

¢ Efficient C compiler and local variable support 

¢ Software subroutine and interrupt stack with unlimited depth 

¢ On-Chip Emulation for unobtrusive, processor-speed-independent debugging 
¢« Low-power wait and stop modes 

¢ Operating frequency down to DC 

e Single power supply 
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DSP56800 Family Architecture 
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Figure 1-2. DSP56800 Core Block Diagram 


1.1.2 Peripheral Blocks 


The following peripheral blocks are available for members of the DSP56800 16-bit Family: 
e Program ROM and RAM modules 
¢ Bootstrap ROM for program RAM parts 
¢ Data ROM and RAM modules 
¢ Phase-locked loop (PLL) module 
— 32.0 kHz and 38.4 kHz crystals accepted 
— Crystal frequencies 2 1 MHz accepted 
— Programmable multiplication factor 
— Three pins required (SXFC, Vppg, and GNDS) 
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¢ 16-bit Timer Module 
— Three independent 16-bit timers 
— Each may be clocked from a pin, the oscillator clock, or the PLL output 
— Zero to two pins required 
¢ Computer operating properly (COP) and real-time timer module 
— COP timer uses output of real-time timer chain 
— Programmable real-time timer 
— Count register readable 
— No pins required 
e Synchronous serial interface module (SSD 
— Synchronous serial interface for hooking up to codecs 
— Frame sync and gated clock modes 
— Independent transmit and receive channels 
— Up to 32-slot network mode available 
— Three to six pins required 
e Serial peripheral interface (SPD) 
— Simple, synchronous, 8-bit serial interface for interfacing to MCUs and MCU-style peripherals 
— Master and slave modes 
— Four pins required 
e Programmable general-purpose I/O 
— Pins can be individually programmed as input or output 
— Pins can be individually multiplexed between peripheral functionality and GPIO 
— Pins can have interrupt capability 


More blocks will be defined in the future to meet customer needs. 
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1.1.3 Family Members 


The DSP56800 core processor is designed as a core processor for a family of Motorola DSPs. An example 
of a chip that can be built with this core is shown in Figure 1-3 on page 1-5. 


IRQA 


16Kx16 
IRQB 


ee DSP56800 Watchdog 
ADR is 
Ext. Bus 16-Bit & Real-time 
= DSP 
DATA Interface 


Core Timers 


Serial 


AA0002 


Figure 1-3. Example of Chip Built Around the DSP56800 Core 


1.2 Introduction to Digital Signal Processing 


DSP is the arithmetic processing of real-time signals sampled at regular intervals and digitized. Examples 
of DSP processing include the following: 


¢ Filtering 

¢ Convolution (mixing two signals) 

¢ Correlation (comparing two signals) 

* Rectification, amplification, and transformation 


Figure 1-4 on page 1-6 shows an example of analog signal processing. The circuit in the illustration filters 
a signal from a sensor using an operational amplifier and controls an actuator with the result. Since the 
ideal filter is impossible to design, the engineer must design the filter for acceptable response by 
considering variations in temperature, component aging, power-supply variation, and component accuracy. 
The resulting circuit typically has low noise immunity, requires adjustments, and is difficult to modify. 
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Figure 1-4. Analog Signal Processing 


The equivalent circuit using a DSP is shown in Figure 1-5 on page 1-7. This application requires an 
analog-to-digital (A/D) converter and digital-to-analog (D/A) converter in addition to the DSP. Even with 
these additional parts, the component count can be lower using a DSP due to the high integration available 
with current components. 
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Figure 1-5. Digital Signal Processing 


Processing in this circuit begins by band limiting the input signal with an anti-alias filter, eliminating 
out-of-band signals that can be aliased back into the pass band due to the sampling process. The signal is 
then sampled, digitized with an A/D converter, and sent to the DSP. 


The filter implemented by the DSP is strictly a matter of software. The DSP can directly employ any filter 
that can also be implemented using analog techniques. Also, adaptive filters can be easily put into practice 
using DSP, whereas these filters are extremely difficult to implement using analog techniques. (Similarly, 
compression can also be implemented on a DSP.) 
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The DSP output is processed by a D/A converter and is low-pass filtered to remove the effects of 
digitizing. In summary, the advantages of using the DSP include the following: 


e Fewer components 

e Stable, deterministic performance 

¢ No filter adjustments 

e Wide range of applications 

e Filters with much closer tolerances 
¢ High noise immunity 

e Adaptive filters easily implemented 
e Self-test can be built in 

¢ Better power-supply rejection 


The DSP56800 Family is not a custom IC designed for a particular application; it is designed as a 
general-purpose DSP architecture to efficiently execute commonly used DSP benchmarks and controller 
code in minimal time. 


As shown in Figure 1-6, the key attributes of a DSP are as follows: 
¢  Miultiply/accumulate (MAC) operation 
e Fetching up to two operands per instruction cycle for the MAC 
e Program control to provide versatile operation 


e Input/output to move data in and out of the DSP 


FIR Filter 


c(k) x (n—k) 
Wed 


AA0005 


Figure 1-6. Mapping DSP Algorithms into Hardware 
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Summary of Features 


The multiply-accumulation (MAC) operation is the fundamental operation used in DSP. The DSP56800 
Family of processors has a dual Harvard architecture optimized for MAC operations. Figure 1-6 on 

page 1-8 shows how the DSP56800 architecture matches the shape of the MAC operation. The two 
operands, c( ) and x(), are directed to a multiply operation, and the result is summed. This process is built 
into the chip by allowing two separate data-memory accesses to feed a single-cycle MAC. The entire 
process must occur under program control to direct the correct operands to the multiplier and save the 
accumulated result as needed. Since the memory and the MAC are independent, the DSP can perform two 
memory moves, a multiply and an accumulate, and two address updates in a single operation. As a result, 
many DSP benchmarks execute very efficiently for a single-multiplier architecture. 


1.3. Summary of Features 


The high throughput of the DSP56800 Family processors makes them well-suited for wireless and wireline 
communication, high-speed control, low-cost voice processing, numeric processing, and computer and 
audio applications. The main features that contribute to this high throughput include the following: 


e Speed—The DSP56800 supports most mid-performance DSP applications. 


¢ Precision—The data paths are 16 bits wide, providing 96 dB of dynamic range; intermediate results 
held in the 36-bit accumulators can range over 216 dB. 


¢ Parallelism—Each on-chip execution unit, memory, and peripheral operates independently and in 
parallel with the other units through a sophisticated bus system. The data ALU, AGU, and program 
controller operate in parallel so that the following can be executed in a single instruction: 


— An instruction pre-fetch 

— A 16-bit x 16-bit multiplication 

— A 36-bit addition 

— Two data moves 

— Two address-pointer updates using one of two types of arithmetic (linear or modulo) 
— Sending and receiving full-duplex data by the serial ports 

— Timers continuing to count in parallel 


¢ Flexibility—While many other DSPs need external communications circuitry to interface with 
peripheral circuits (such as A/D converters, D/A converters, or host processors), the DSP56800 
Family provides on-chip serial and parallel interfaces that can support various configurations of 
memory and peripheral modules. The peripherals are interfaced to the DSP56800 core through a 
peripheral interface bus, designed to provide a common interface to many different peripherals. 


¢ Sophisticated debugging— Motorola’s On-Chip Emulation technology (OnCE) allows simple, 
inexpensive, and speed-independent access to the internal registers for debugging. OnCE tells 
application programmers exactly what the status is within the registers, memory locations, and even 
the last instructions that were executed. 


¢ Phase-locked loop (PLL)-based clocking—The PLL allows the chip to use almost any available 
external system clock for full-speed operation while also supplying an output clock synchronized 
to a synthesized internal core clock. It improves the synchronous timing of the processors’ external 
memory port, eliminating the timing skew common on other processors. 


¢ Invisible pipeline—The three-stage instruction pipeline is essentially invisible to the programmer, 
allowing straightforward program development in either assembly language or high-level 
languages such as C or C++. 
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¢ Instruction set—The instruction mnemonics are MCU-like, making the transition from 
programming microprocessors to programming the chip as easy as possible. New microcontroller 
instructions, addressing modes, and bit-field instructions allow for significant decreases in program 
code size. The orthogonal syntax controls the parallel execution units. The hardware DO loop 
instruction and the repeat (REP) instruction make writing straight-line code obsolete. 


¢ Low power—Designed in CMOS, the DSP56800 Family inherently consumes very low power. 
Two additional low power modes, stop and wait, further reduce power requirements. Wait is a 
low-power mode where the DSP56800 core is shut down but the peripherals and interrupt controller 
continue to operate so that an interrupt can bring the chip out of wait mode. In stop mode, even more 
of the circuitry is shut down for the lowest power-consumption mode. There are also several 
different ways to bring the chip out of stop mode. 


1.4 For the Latest Information 


For the latest electronic version of this document, as well as other DSP documentation (including user’s 
manuals, product briefs, data sheets, and errata) please consult the inside front cover of this manual for 
contact information for the following services: 


¢ Motorola MFAX™ service 
¢ Motorola DSP World Wide Web site 
¢ Motorola DSP Helpline 


The MFAX service and the DSP Web site maintain the most current specifications, documents, and 
drawings. These two services are available on demand 24 hours a day. 
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Chapter 2 
Core Architecture Overview 


The DSP56800 core architecture is a 16-bit multiple-bus processor designed for efficient real-time digital 
signal processing and general purpose computing. The architecture is designed as a standard 
programmable core from which various DSP integrated circuit family members can be designed with 
different on-chip and off-chip memory sizes and on-chip peripheral requirements. This chapter presents 
the overall core architecture and the general programming model. More detailed information on the data 
ALU, AGU, program controller, and JTAG/OnCE blocks within the architecture are found in later 
chapters. 


2.1 Core Block Diagram 


The DSP56800 core is composed of functional units that operate in parallel to increase the throughput of 
the machine. The program controller, AGU, and data ALU each contain their own register set and control 
logic, so each may operate independently and in parallel with the other two. Likewise, each functional unit 
interfaces with other units, with memory, and with memory-mapped peripherals over the core’s internal 
address and data buses. The architecture is pipelined to take advantage of the parallel units and 
significantly decrease the execution time of each instruction. 


For example, it is possible for the data ALU to perform a multiplication in a first instruction, for the AGU 
to generate up to two addresses for a second instruction, and for the program controller to be fetching a 
third instruction. In a similar manner, it is possible for the bit-manipulation unit to perform an operation of 
the third instruction described above in place of the multiplication in the data ALU. 


The major components of the core are the following: 
¢ Data ALU 
¢ AGU 
¢ Program controller and hardware looping unit 
¢ Bus and bit-manipulation unit 
¢ OnCE debug port 
e Address buses 
¢ Data buses 


Figure 2-1 on page 2-2 shows a block diagram of the CPU architecture. 
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Figure 2-1. DSP56800 Core Block Diagram 


Note that Figure 2-1 illustrates two methods for connecting peripherals to the DSP56800 core: using the 
Motorola-standard IP-BUS interface or via a dedicated peripheral global data bus (PGDB). When the 
IP-BUS interface is used, peripheral registers may be memory mapped into any data (X) memory address 
range and are accessed with standard X-memory reads and writes. When the PGDB interface is used, 
peripheral registers are mapped to the last 64 locations in X memory and are accessed with a special 
memory addressing mode (see Section 4.2.4.3, “I/O Short Address (Direct Addressing): <pp>,” on 

page 4-23). 


The interface method used to connect to peripherals is dependent on the specific DSP56800-based device 
being used. Consult your device user’s manual for more information on peripheral interfacing. 
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2.1.1 Data Arithmetic Logic Unit (ALU) 


The data arithmetic logic unit (ALU) performs all of the arithmetic and logical operations on data 
operands. It consists of the following: 


¢ Three 16-bit input registers (XO, YO, and Y1) 

¢ Two 32-bit accumulator registers (A and B) 

¢ Two 4-bit accumulator extension registers (A2 and B2) 

e An accumulator shifter (AS) 

¢ One data limiter 

¢« One 16-bit barrel shifter 

¢ One parallel (single cycle, non-pipelined) multiply-accumulator (MAC) unit 


The data ALU is capable of multiplication, multiply-accumulation (with positive or negative 
accumulation), addition, subtraction, shifting, and logical operations in one instruction cycle. Arithmetic 
operations are done using two’s-complement fractional or integer arithmetic. Support is also provided for 
unsigned and multi-precision arithmetic. 


Data ALU source operands may be 16, 32, or 36 bits and may individually originate from input registers, 
memory locations, immediate data, or accumulators. ALU results are stored in one of the accumulators. In 
addition, some arithmetic instructions store their 16-bit results either in one of the three data ALU input 
registers or directly in memory. Arithmetic operations and shifts can have a 16-bit or a 36-bit result. 
Logical operations are performed on 16-bit operands and always yield 16-bit results. 


Data ALU register values can be transferred (read or write) across the core global data bus (CGDB) as 
16-bit operands. The XO register value can also be written by X memory data bus two (XDB2) as a 16-bit 
operand. Refer to Chapter 3, “Data Arithmetic Logic Unit,” for a detailed description of the data ALU. 


2.1.2 Address Generation Unit (AGU) 


The address generation unit (AGU) performs all of the effective address calculations and address storage 
necessary to address data operands in memory. The AGU operates in parallel with other chip resources to 
minimize address-generation overhead. It contains two ALUs, allowing the generation of up to two 16-bit 
addresses every instruction cycle: one for either X memory address bus one (XAB1) or program address 
bus (PAB) and one for X memory address bus two (XAB2). The ALU can directly address 65,536 
locations on the XAB1 or XAB2 and 65,536 locations on the PAB, totaling 131,072 sixteen-bit data words. 
It supports a complete set of addressing modes. Its arithmetic unit can perform both linear and modulo 
arithmetic. 


The AGU contains the following registers: 
¢ Four address registers (RO-R3) 
¢ A stack pointer register (SP) 
e« An offset register (N) 
« A modifier register (M01) 
¢ A modulo arithmetic unit 


e An incrementer/decrementer unit 
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The address registers are 16-bit registers that may contain an address or data. Each address register can 
provide an address for the XAB1 and PAB address buses. For instructions that read two values from X data 
memory, R3 provides an address for the XAB2, and RO or R1 provides an address for the XAB1. The 
modifier and offset registers are 16-bit registers that control updating of the address registers. The offset 
register can also be used to store 16-bit data. AGU registers may be read or written by the CGDB as 16-bit 
operands. Refer to Chapter 4, “Address Generation Unit,” for a detailed description of the AGU. 


2.1.3 Program Controller and Hardware Looping Unit 


The program controller performs the following: 
e Instruction prefetch 
¢ Instruction decoding 
¢ Hardware loop control 
¢ Interrupt (exception) processing 


Instruction execution is carried out in other core units such as the data ALU, AGU, or bit-manipulation 
unit. The program controller consists of the following: 


e« A program counter unit 
¢ Instruction latch and decoder 
¢ Hardware looping control logic 
¢ Interrupt control logic 
e Status and control registers 
Located within the program controller are the following: 
¢ Four user-accessible registers: 
— Loop address register (LA) 
— Loop count register (LC) 
— Status register (SR) 
— Operating mode register (OMR) 
e A program counter (PC) 
e A hardware stack (HWS) 


In addition to the tasks listed above, the program controller also controls the memory map and operating 
mode. The operating mode and memory map are programmable via the OMR, and are established after 
reset by external interface pins. 


The HWS is a separate internal last-in-first-out (LIFO) buffer of two 16-bit words that stores the address of 
the first instruction in a hardware DO loop. When a new hardware loop is begun by executing the DO 
instruction, the address of the first instruction in the loop is stored (pushed) on the “top” location of the 
HWS, and the LF bit in the SR is set. The previous value of the loop flag (LF) bit is copied to the OMR’s 
NL bit. When an ENDDO instruction is encountered or a hardware loop terminates naturally, the 16-bit 
address in the “top” location of the HWS is discarded, and the LF bit is updated with the value in the 
OMR’s nested looping (NL) bit. 


The program controller is described in detail in Chapter 5, “Program Controller.” For more details on 
program looping, refer to Section 5.3, “Program Looping,” on page 5-14 and Section 8.6, “Loops,” on 
page 8-20. For information on reset and interrupts, refer to Chapter 7, “Interrupts and the Processing 
States.” 
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2.1.4 Bus and Bit-Manipulation Unit 


Transfers between internal buses are accomplished in the bus unit. The bus unit is similar to a switch 
matrix and can connect any two of the three internal data buses together without introducing delays. This 
allows data to be moved from program to data memory, for example. The bus unit is also used to transfer 
data to the PGDB on those devices that use it to connect to on-chip peripherals. 


The bit-manipulation unit performs bit-field manipulations on X (data) memory words, peripheral 
registers, and all registers within the DSP56800 core. It is capable of testing, setting, clearing, or inverting 
any bits specified in a 16-bit mask. For branch-on-bit-field instructions, this unit tests bits on the upper or 
lower byte of a 16-bit word (that is, the mask can only test up to 8 bits at a time). 


2.1.5 On-Chip Emulation (OnCE) Unit 


The On-Chip Emulation (OnCE) unit allows the user to interact in a debug environment with the 
DSP56800 core and its peripherals non-intrusively. Its capabilities include examining registers, on-chip 
peripheral registers or memory, setting breakpoints on program or data memory, and stepping or tracing 
instructions. It provides simple, inexpensive, and speed-independent access to the internal DSP56800 core 
by interacting with a user-interface program running on a host workstation for sophisticated debugging and 
economical system development. 


Dedicated pins through the JTAG port allow the user access to the DSP in a target system, retaining debug 
control without sacrificing other user-accessible on-chip resources. This technique eliminates the costly 
cabling and the access to processor pins required by traditional emulator systems. Refer to Chapter 9, 
“JTAG and On-Chip Emulation (OnCE™),” for a detailed description of the JTAG/OnCE port. Consult 
your development system’s documentation for information on debugging using the JTAG/OnCE port 
interface. 


2.1.6 Address Buses 


Addresses are provided to the internal X data memory on two unidirectional 16-bit buses, X memory 
address bus one (XAB1) and X memory address bus two (KAB2). Program memory addresses are 
provided on the 16-bit program address bus (PAB). Note that XAB1 can provide addresses for accessing 
both internal and external memory, whereas XAB2 can only provide addresses for accessing internal 
memory. 


2.1.7 Data Buses 


Inside the chip, data is transferred using the following: 
¢ Bidirectional 16-bit buses: 
— Core global data bus (CGDB) 
— Program data bus (PDB) 
— Peripheral data bus (PGDB)! 
¢ One unidirectional 16-bit bus: X memory data bus two (XDB2) 


Data transfer between the data ALU and the X data memory uses the CGDB when one memory access is 
performed. When two simultaneous memory reads are performed, the transfers use the CGDB and the 
XDB2. All other data transfers occur using the CGDB, except transfers to and from peripherals on 


1. Implemented on DSP56800 family devices that do not use the IP-BUS interface for peripherals. 
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DSP56800-based devices that implement the PGDB peripheral data bus. Instruction word fetches occur 
simultaneously over the PDB. The bus structure supports general register-to-register moves, 
register-to-memory moves, and memory-to-register moves, and can transfer up to three 16-bit words in the 
same instruction cycle. Transfers between buses are accomplished in the bus and bit-manipulation unit. As 
a general rule, when any register less than 16 bits wide is read, the unused bits are read as zeros. Reserved 
and unused bits should always be written with zeros to insure future compatibility. 


2.2 Memory Architecture 


The DSP56800 has a dual Harvard memory architecture, with separate program and data memory spaces. 
Each address space supports up to 2!6 (65,536) memory words. Dedicated address and data buses for each 
address space allow for simultaneous accesses to both program memory and data memory. There is also a 
support for a second read-only data path to data memory. In DSP56800 Family devices that implement this 
second bus, it is possible to initiate two simultaneous data read operations, allowing for a total of three 
parallel memory accesses. 


$FFFF 64K or 216 $FFFF 64K or 2'6 
Optimized for 
$FFCO Peripherals (64K - 64) 
Program 
Memory 
Space X Data 
Memory 
Space 
$7F 127 
Interrupt 
$0 Vectors 0 $0 0 


Figure 2-2. DSP56800 Memory Spaces 


Locations $0 through $007F in the program memory space are available for reset and interrupt vectors. 
Peripheral registers are located in the data memory address space as memory-mapped registers. This 
peripheral space can be located anywhere in the data address space, although the address range 
$FFCO-$FFFF is frequently used because an addressing mode optimized for this region provides faster 
access; however, the location of the peripheral space is dependent on the specific system implementation 
of the DSP56800 core. See Section 4.2.4.3, “I/O Short Address (Direct Addressing): <pp>,” on page 4-23 
for more information. 
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2.3 Blocks Outside the DSP56800 Core 


The following blocks are optionally found on DSP56800-based DSP chips and are considered peripheral 
and memory blocks, not part of the DSP56800 core. These and other blocks are described in greater detail 
in the appropriate chip-specific user’s manual. Figure 2-3 shows an example DSP56800-based device. 
Note that this device uses the Motorola IP-BUS interface to connect to peripherals. Other chips may use 
the PGDB peripheral bus. 
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IRQB 
IRQA 
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— 16-Bit Data Bus 


Figure 2-3. Sample DSP56800-Family Chip Block Diagram 


2.3.1 External Data Memory 


External data memory (data RAM, data ROM, or both) can be added around the core on a chip. Addresses 
are received from the XAB1 and XAB2. Data transfers occur on the CGDB and XDB2. One read, one 
write, or two reads can be performed during one instruction cycle using the internal data memory. 
Depending upon the particular on-chip peripherals found on a device, some portion of the data address 
space may be reserved for peripheral registers, and not be accessible as external data memory. A total of 
65,536 memory locations can be addressed. 
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2.3.2 Program Memory 


Program memory (program RAM, program ROM, or both) can be added around the core on a chip. 
Addresses are received from the PAB and data transfers occur on the PDB. The first 128 locations of the 
program memory are available for interrupt vectors, although it is not necessary to use all 128 locations for 
interrupt vectors. Some can be used for the user program if desired. The number of locations required for 
an application depends on what peripherals on the chip are used by an application and the locations of their 
corresponding interrupt vectors. The program memory may be expanded off chip, and up to 65,536 
locations can be addressed. 


2.3.3 Bootstrap Memory 


A program bootstrap ROM is usually found on chips that have on-chip program RAM instead of ROM. 

The bootstrap ROM is used for initially loading application code into the on-chip program RAM so it can 
be run from there. Refer to Section 5.1.9.1, “Operating Mode Bits (MB and MA)—Bits 1-0,” on page 5-10 
and to the user’s manual of the particular DSP chip for a description of the different bootstrapping modes. 


2.3.4 IP-BUS Bridge 


Some devices based on the DSP56800 architecture connect to on-chip peripherals using the 
Motorola-standard IP-BUS interface. These devices contain an IP-BUS bridge unit, which allows 
peripherals to be accessed using the CGDB data bus and XAB1 address bus. Peripheral registers are 
memory-mapped into the data address space. Consult the appropriate DSP56800-based device User’s 
Manual for more information on peripheral interfacing for a particular chip. 


2.3.5 Phase Lock Loop (PLL) 


The phase lock loop (PLL) allows the DSP chip to use an external clock different from the internal system 
clock, while optionally supplying an output clock synchronized to a synthesized internal clock. This PLL 
allows full-speed operation using an external clock running at a different speed. The PLL performs 
frequency multiplication, skew elimination, and reduces overall system power by reducing the frequency 
on the input reference clock. 


2.4 DSP56800 Core Programming Model 


The registers in the DSP56800 core that are considered part of the DSP56800 core programming model are 
shown in Figure 2-4 on page 2-9. There may also be other important registers that are not included in the 
DSP56800 core, but mapped into the data address space. These include registers for peripheral devices and 
other functions that are not bound into the core. 
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Figure 2-4. DSP56800 Core Programming Model 
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Chapter 3 
Data Arithmetic Logic Unit 


This chapter describes the architecture and the operation of the data arithmetic logic unit (ALU), the block 
where the multiplication, logical operations, and arithmetic operations are performed. (Addition can also 
be performed in the address generation unit, and the bit-manipulation unit can perform logical operations.) 
The data ALU contains the following: 


Three 16-bit input registers (XO, YO, and Y1) 

Two 32-bit accumulator registers (A and B) 

Two 4-bit accumulator extension registers (A2 and B2) 

An accumulator shifter (AS) 

One data limiter 

One 16-bit barrel shifter 

One parallel (single cycle, non-pipelined) multiply-accumulator (MAC) unit 


Multiple buses in the data ALU perform complex arithmetic operations (such as a multiply-accumulate 
operations) in parallel with up to two memory transfers. A discussion of fractional and integer data 
representations; signed, unsigned, and multi-precision arithmetic; condition code generation; and the 
rounding modes used in the data ALU are also described in this section. 


The data ALU can perform the following operations in a single instruction cycle: 


Multiplication (with or without rounding) 

Multiplication with inverted product (with or without rounding) 
Multiplication and accumulation (with or without rounding) 
Multiplication and accumulation with inverted product (with or without rounding) 
Addition and subtraction 

Compares 

Increments and decrements 

Logical operations (AND, OR, and EOR) 

One’s-complement 

Two’s-complement (negation) 

Arithmetic and logical shifts 

Rotates 

Multi-bit shifts on 16-bit values 

Rounding 


Absolute value 
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Overview and Architecture 


The major components of the data ALU are the following: 


Three 16-bit input registers (XO, YO, and Y1) 

Two 32-bit accumulator registers (A and B) 

Two 4-bit accumulator extension registers (A2 and B2) 

An accumulator shifter (AS) 

One data limiter 

One 16-bit barrel shifter 

One parallel (single cycle, non-pipelined) multiply-accumulator (MAC) unit 


A block diagram of the data ALU unit is shown in Figure 3-1 on page 3-3, and its corresponding 
programming model is shown in Figure 3-2 on page 3-4. In the programming model, accumulator “A” 
refers to the entire 36-bit accumulator register, whereas “A2,” “Al,” and “AO” refer to the directly 
accessible extension, most significant portions, and least significant portions of the 36-bit accumulator, 
respectively. Instructions can access the register as a whole or by these individual portions (see 

Section 3.1.2, “Data ALU Accumulator Registers,” on page 3-4 and Section 3.2, “Accessing the 
Accumulator Registers,” on page 3-7). The blocks and registers within the data ALU are explained in the 
following sections. 
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Figure 3-1. Data ALU Block Diagram 
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Figure 3-2. Data ALU Programming Model 


3.1.1 Data ALU Input Registers (X0, Y1, and YO) 


The data ALU registers (XO, Y1, and YO) are 16-bit registers that serve as inputs for the data ALU. Each 
register may be read or written by the CGDB as a word operand. They may be treated as three independent 
16-bit registers, or as one 16-bit register and one 32-bit register. Y1 and YO can be concatenated to form 
the 32-bit register Y, with Y1 being the most significant word and YO being the least significant word. 
Figure 3-2 shows this arrangement. 


These data ALU input registers are used as source operands for most data ALU operations and allow new 
operands to be loaded from the memory for the next instruction while the register contents are used by the 
current instruction. XO may also be written by the XDB2 during the dual read instruction. Certain 
arithmetic operations also allow these registers to be specified as destinations. 


3.1.2 Data ALU Accumulator Registers 


The two 36-bit data ALU accumulator registers can be accessed either as a 36-bit register (A or B) or as the 
following, individual portions of the register: 


e  4-bit extension register (A2 or B2) 
¢ 16-bit MSP (A1 or B1) 
¢ 16-bit LSP (AO or BO) 
The three individual portions make up the entire accumulator register, as shown in Figure 3-2. 


These two techniques for accessing the accumulator registers provide important flexibility for both DSP 
algorithms and general-purpose computing tasks. Accessing these registers as entire accumulators (A or B) 
is particularly useful for DSP tasks, because this preserves the full precision of multiplication and other 
ALU operations. Data limiting and saturation are also possible using the full registers, in cases where the 
final result of a computation that has overflowed is moved (see Section 3.4.1, “Data Limiter,” on page 
3-26). 
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Accessing an accumulator through its individual portions (A2, Al, AO, B2, B1, or BO) is useful for systems 
and control programming. When accumulators are manipulated using their constituent components, 
saturation and limiting are disabled. This allows for microcontroller-like 16-bit integer processing for 
non-DSP purposes. 


Section 3.2, “Accessing the Accumulator Registers,” provides a complete discussion of the ways in which 
the accumulators can be employed. A description of the data limiting and saturation features of the data 
ALU is provided in Section 3.4, “Saturation and Data Limiting.” 


3.1.3 Multiply-Accumulator (MAC) and Logic Unit 


The multiply-accumulator (MAC) and logic unit is the main arithmetic processing unit of the DSP. This is 
the block that performs all multiplication, addition, subtraction, logical, and other arithmetic operations 
except shifting. It accepts up to three input operands and outputs one 36-bit result of the form 
EXT:MSP:LSP (extension:most significant product:least significant product). Arithmetic operations in the 
MAC unit occur independently and in parallel with memory accesses on the CGDB, XDB2, and PDB. The 
data ALU registers provide pipelining for both data ALU inputs and outputs. An input register may be 
written by memory in the same instruction where it is used as the source for a data ALU operation. The 
inputs of the MAC and logic unit can come from the X and Y registers (XO, Y1, YO), the accumulators 
(Al, B1, A, B), and also directly from memory for common instructions such as ADD and SUB. 


The multiplier executes 16-bit x 16-bit parallel signed/unsigned fractional and 16-bit x 16-bit parallel 
signed integer multiplications. The 32-bit product is added to the 36-bit contents either of the A or B 
accumulator or of the 16-bit contents of the XO, YO, or Y1 registers and then stored in the same register. 
This multiply-accumulate is a single cycle operation (no pipeline). For integer multiplication, the 16 LSBs 
of the product are stored in the MSP of the accumulator; the extension register is filled with sign extension 
and the LSP of the accumulator remains unchanged. 


If a multiply without accumulation is specified by a MPY or MPYR instruction, the unit clears the 
accumulator and then adds the contents to the product. The results of all arithmetic instructions are valid 
(sign extended) 36-bit operands in the form EXT:MSP:LSP (A2:A1:A0 or B2:B1:BO). 


When a 36-bit result is to be stored as a 16-bit operand, the LSP can simply be truncated, or it can be 
rounded into the MSP. The rounding performed is either the convergent rounding (round to the nearest 
even) or two’s-complement rounding. The type of rounding is specified by the rounding bit in the 
operating mode register. See Section 3.5, “Rounding,” for a more detailed discussion of rounding. 


The logic unit performs the logical operations AND, OR, EOR, and NOT on data ALU registers. It is 16 
bits wide and operates on data in the MSP of the accumulator. The least significant and EXT portions of 
the accumulator are not affected. Logical operations can also be performed in the bit-manipulation unit. 
The bit-manipulation unit is used when performing logical operations with immediate values and can be 
performed on any register or memory location. 


3.1.4 Barrel Shifter 


The 16-bit barrel shifter performs single-cycle, 0- to 15-bit arithmetic or logical shifts of 16-bit data. Since 
both the amount to be shifted as well as the value to shift come from registers, it is possible to shift data by 
a variable amount. See Figure 3-3 on page 3-6. It is also possible to use this unit to right shift 32-bit values 
using the ASRAC and LSRAC instructions, as demonstrated in Section 8.2, “16- and 32-Bit Shift 
Operations,” on page 8-8. 
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Figure 3-3. Right and Left Shifts Through the Multi-Bit Shifting Unit 


After shifting, the extension register is always loaded with zero extension for logical shifts or sign 
extension for arithmetic shifts. For right shifts, the LSP is set to zero except for the ASRAC and LSRAC 
instructions, where the lower bits are shifted into the LSP. For left shifts, the upper bits are not shifted into 
the extension register, and the LSP is always set to zero. 


3.1.5 Accumulator Shifter 


The accumulator shifter is an asynchronous parallel shifter with a 36-bit input and a 36-bit output. The 
operations performed by this unit are as follows: 

¢ No shift performed—ADD, SUB, MAC, and so on 

¢ 1-bit left shift—ASL, LSL, ROL 

¢  1-bit right shift—ASR, LSR, ROR 

¢ Force to zero—MPY, IMP Y(16) 
The output of the shifter goes directly to the MAC unit as an input. 


3.1.6 Data Limiter and MAC Output Limiter 


The data ALU contains two units that implement optional saturation of mathematical results, the Data 
Limiter and the MAC Output Limiter. The Data Limiter saturates values when data is moved out of an 
accumulator with a move instruction or parallel move. The MAC Output Limiter saturates the output of the 
data ALU’s MAC unit. 


Section 3.4, “Saturation and Data Limiting,” provides an in-depth discussion of saturation and limiting, as 
well as a description of the operation of the two limiter units. 
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3.2 Accessing the Accumulator Registers 


An accumulator register can be accessed in two different ways: 
* as an entire register (F) 
¢ by the individual register portion (F2, Fl, or FO) 


The ability to access the accumulator registers in both ways provides important flexibility, allowing for 
powerful DSP algorithms as well as general-purpose computing tasks. 


Accessing an entire accumulator register (A or B) is particularly useful for DSP tasks, since it preserves the 
complete 36-bit register—and thus the entire precision of a multiplication or other ALU operation. It also 
provides limiting (or saturation) capability in cases when storing a result of a computation that would 
overflow the destination size. See Section 3.4, “Saturation and Data Limiting.” 


Accessing an accumulator through its individual portions (F2, F1, or FO) is useful for systems and control 
programming. For example, if a DSP algorithm is in progress and an interrupt is received, it is usually 
necessary to save every accumulator used by the interrupt service routine. Since an interrupt can occur at 
any step of the DSP task (that is, right in the middle of a DSP algorithm), it is important that no saturation 
takes place. Thus, an interrupt service routine can store the individual accumulator portions on the stack, 
effectively saving the entire 36-bit value without any limiting. Upon completion of the interrupt routine, 
the contents of the accumulator can be exactly restored from the stack. 


The DSP56800 instruction set transparently supports both methods of access. An entire accumulator may 
be accessed simply through the specification of the full-register name (A or B), while portions are accessed 
through the use of their respective names (AO, B1, and so on). 


Table 3-1 provides a summary of the various access methods. These are described in more detail in 
Section 3.2.1, “Accessing an Accumulator by Its Individual Portions,” and Section 3.2.2, “Accessing an 
Entire Accumulator.” 


Table 3-1. Accessing the Accumulator Registers 


Register Read of an Accumulator Register Write to an Accumulator Register 

A For a MOVE instruction: For a MOVE instruction: 

B If the extension bits are not in use for the The 16 bits of the CGDB bus are written into 
accumulator to be read, then the 16-bit con- the 16-bit F1 portion of the register. 
tents of the F1 portion of the accumulator are | The extension portion of the same accumula- 
read onto the CGDB bus. tor, F2, is filled with sign extension. The FO 
If the extension bits are in use, then a 16-bit portion is set to zero. 
“limited” value is instead read onto the CGDB. 
See Section 3.4.1, “Data Limiter.” 
When used in an arithmetic operation: 
All 36 bits are sent to the MAC unit without 
limiting. 

A2 For a MOVE instruction: For a MOVE instruction: 

B2 The 4-bit register is read onto the 4 LSBs of The 4 LSBs of the CGDB are written into the 
the CGDB bus. 4-bit register; the upper 12 bits are ignored. 
The upper 12 bits of the bus are sign The corresponding F1 and FO portions are not 
extended. modified. 
See Figure 3-5 on page 3-9. See Figure 3-4 on page 3-8. 
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Table 3-1. Accessing the Accumulator Registers (Continued) 


Register Read of an Accumulator Register Write to an Accumulator Register 
Al For a MOVE instruction: For a MOVE instruction: 
B1 The 16-bit F1 portion is read onto the CGDB The contents of the CGDB bus are written into 
bus. the 16-bit F1 register. 
The corresponding F2 and FO portions are not 
When used in an arithmetic operation: modified. 
The F1 register is used as a 16-bit source 
operand for an arithmetic operation. 
F1 can be used in the following: 
MOVE 
Parallel Move 
Several different arithmetic 
AO For a MOVE instruction: For a MOVE instruction: 
BO The 16-bit FO register is read onto the CGDB__| The contents of the CGDB bus are written into 
bus. the 16-bit FO register. 
The corresponding F2 and F1 portions are not 
modified. 


In all cases in Table 3-1 where a MOVE operation is specified, it is understood that the function is 
identical for parallel moves and bit-field operations. 


3.2.1 Accessing an Accumulator by Its Individual Portions 


The instruction set provides instructions for loading and storing one of the portions of an accumulator 
register without affecting the other two portions. When an instructions uses the F1 or FO notation instead of 
F, the instruction only operates on the 16-bit portion specified without modifying the other two portions. 
When an instruction specifies F2, then the instruction operates only on the 4-bit accumulator extension 
register without modifying the F1 or FO portions of the accumulator. Refer to Table 3-1 for a summary of 
accessing the accumulator registers. 


Data limiting, as outlined in Section 3.4, “Saturation and Data Limiting,” is enabled only when an entire 
accumulator is being stored to memory. When only a portion of an accumulator is being stored (by using 
an instruction which specifies F2, F1, or FO), limiting through the data limiter does not occur. 


When F2 is written, the register receives the low-order portion of the word; the high-order portion is not 
used. See Figure 3-4. 


15 43 0 


—— LSB of | 
Not Used Word 
0 


15 43 


No Bits Present Register F2 


Figure 3-4. Writing the Accumulator Extension Registers (F2) 


Register F2 Used 
as a Destination 
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When F2 is read, the register contents occupy the low-order portion (bits 3-0) of the word; the high-order 
portion (bits 15—4) is sign extended. See Figure 3-5. 


15 43 0 


Register F2 . 
Used as a Source MO. Bis varceent Register F2 
LSB Of 
Word 


15 43 0 


Sign Extension | Contents 


Figure 3-5. Reading the Accumulator Extension Registers (F2) 


Figure 3-6 shows the result of writing values to each portion of the accumulator. Note that only the portion 
specified in the instruction is modified; the other two portions remain unchanged. 


Writing the F2 Portion Example: MOVE #SABCD, A2 


Before Execution After Execution 
A 1X X X XI < o XxX 1X X X XI < y X A 1X X X XI . “y Xx 1X X X X|] . * Xx 
35 32 aes 16 rae 35 32 x KK 16 x x 


Writing the F1 Portion Example: MOVE #$1234,A1 


Before Execution After Execution 
A 1X X X XI = e x IX X X XI < 5 x A = aes 4 1X X X X|] a ~ x 
35 32 ras 16 ras 35 32 pt es 4 16 x KX 


Writing the FO Portion Example: MOVE #SA987,A0 


Before Execution After Execution 
A IX X X XI z x Xx 1X X X XI a 5 x A 1X X X XI z x XIA eet > 7 
35 32 x KK 16 x KK 35 32 xX KK 16 Ao 67 


Figure 3-6. Writing the Accumulator by Portions 


See Section 3.2, “Accessing the Accumulator Registers,” for a discussion of when it is appropriate to 
access an accumulator by its individual portions and when it is appropriate to access it as an entire 
accumulator. 
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3.2.2 Accessing an Entire Accumulator 


3.2.2.1. Accessing for Data ALU Operations 


The complete accumulator is accessed to provide a source, a destination, or both for an ALU or 
multiplication operation in the data ALU. In this case, the accumulator is written as an entire 36-bit 
accumulator (F), not as an individual register (F2, Fl, or FO). The accumulator registers receive the 
EXT:MSP:LSP of the multiply-accumulator unit output when used as a destination and supply a source 
accumulator of the same form. Most data ALU operations specify the 36-bit accumulator registers as 
source operands, destination operands, or both. 


3.2.2.2 Writing an Accumulator with a Small Operand 


Automatic sign extension of the 36-bit accumulators is provided when the accumulator is written with a 
smaller size operand. This can occur when writing F from the CGDB (MOVE instruction) or with the 
results of certain data ALU operations (for example, ADD, SUB, or TFR from a 16-bit register to a 36-bit 
accumulator). If a word operand is to be written to an accumulator register (F), the F1 portion of the 
accumulator is written with the word operand, the LSP is zeroed, and the EXT portion receives sign 
extension. This is also the case for a MOVE instruction that moves one accumulator to another, but is not 
the case for a TFR instruction that moves one entire accumulator to another. No sign extension is 
performed if an individual 16-bit register is written (F1 or FO). 


NOTE: 


A read of the F1 register in a MOVE instruction is identical to a read of the 
F accumulator for the case where the extension bits of that accumulator 
only contain sign-extension information. In this case there is no need for 
saturation or limiting, so reading the F accumulator produces the same 
result as reading the F1 register. 


3.2.2.3. Extension Registers as Protection Against Overflow 


The F2 extension registers offer protection against 32-bit overflow. When the result of an accumulation 
crosses the MSB of MSP (bit 31 of F), the extension bit of the status register (E) is set. Up to 15 overflows 
or underflows are possible using these extension bits, after which the sign is lost beyond the MSB of the 
extension register. When this occurs, the overflow bit (V) in the status register is set. Having an extension 
register allows overflow during intermediate calculations without losing important information. This is 
particularly useful during execution of DSP algorithms, when intermediate calculations (but not the final 
result that is written to memory or to a peripheral) may sometimes overflow. 


The logic detection of “extension register in use” is also used to determine when to saturate the value of an 
accumulator when it is being read onto the CGDB or transferred to any data ALU register. If saturation 
occurs, the content of the original accumulator is not affected (except if the same accumulator is specified 
as both source and destination); only the value transferred over the CGDB is limited to a full-scale positive 
or negative 16-bit value ($7FFF or $8000). 


When limiting occurs, a flag is set and latched in the status register (L). The limiting block is explained in 
more detail in Section 3.4.1, “Data Limiter.” 


NOTE: 


Limiting will be performed only when the entire 36-bit accumulator 
register (F) is specified as the source for a parallel data move or a register 
transfer. It is not performed when F2, F1 or FO is specified. 
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3.2.2.4 Examples of Writing the Entire Accumulator 


Figure 3-7 shows the result of writing a 16-bit signed value to an entire accumulator. Note that all three 
portions of the accumulator are modified. The LSP (BO) is set to zero, and the extension portion (B2) is 
appropriately sign extended. 


Writing a Positive Value into 36-Bit Accumulator Example: MOVE #$1234,B 


Before Execution After Execution 
B 1X X X XI x x x 1X X X XI z = x B = aes 4 0. 0 OO. 7 > 0 
35 32 xX KK 16 ras 35 32 pte se 16 po 0 9 8 


Writing a Negative Value into 36-Bit Accumulator Example: MOVE #$A987,B 


Before Execution After Execution 
B 1X X X XI x 5 x 1X X X XI = " x B A aoe aoa 5 7 10 0 0 O| = = 0 
35 32 xX KK 16 xX KK 35 32 Ao 8 7 16 po 0 0 8 


Figure 3-7. Writing the Accumulator as a Whole 


Successfully using the DSP56800 Family requires a full understanding of the methods and implications of 
the various accumulator-register access methods. The architecture of the accumulator registers offers a 
great deal of flexibility and power, but it is necessary to completely understand the access mechanisms 
involved to fully exploit this power. 


3.2.3 General Integer Processing 


General integer and control processing typically involves manipulating 16- and 32-bit integer quantities. 
Rarely will such code use a full 36-bit accumulator such as that implemented by the DSP56800 Family. 
The architecture of the DSP56800 supports the manipulation of 16-bit integer quantities using the 
accumulators, but care must be taken when performing such manipulation. 


3.2.3.1 Writing Integer Data to an Accumulator 


When loading an accumulator, it is most desirable for the 36 bits of the accumulator to correctly reflect the 
16-bit data. To this end, it is recommended that all accumulator loads of 16-bit data clear the least 
significant portion of the accumulator and also sign extend the extension portion. This can be 
accomplished through specifying the full accumulator register as the destination of the move, as shown in 
Example 3-1. 


Example 3-1. Loading an Accumulator with a Word for Integer Processing 


MOVE X:(RO),A ; A2 receives sign extension 
; Al receives the 16-bit data 
; AO receives the value $0000 
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Loading a 16-bit integer value into the Al portion of the register is generally discouraged. In almost all 
cases, it is preferable to follow Example 3-1 on page 3-11. One notable exception is when 36-bit 
accumulator values must be stored temporarily. See Section 3.2.5, “Saving and Restoring Accumulators,” 
for more details. 


3.2.3.2 Reading Integer Data from an Accumulator 


Integer and control processing algorithms typically involve the manipulation of 16-bit quantities that 
would be adversely affected by saturation or limiting. When such integer calculations are performed, it is 
often desirable not to have overflow protection when results are stored to memory. To ensure that the data 
ALU’s data limiter is not active when an accumulator is being read, it is necessary to store not the full 
accumulator, but just the MSP (A1 portion). See Example 3-2. 


Example 3-2. Reading a Word from an Accumulator for Integer Processing 


MOVE A1,X:Variable_1; Saturation is disabled 


Note that with the use of the Al register instead of the A register, saturation is disabled. The value in A1 is 
written “as is” to memory. 


3.2.4 Using 16-Bit Results of DSP Algorithms 


A DSP Algorithm may use the full 36-bit precision of an accumulator while performing DSP calculations 
such as digital filtering or matrix multiplications. Upon completion of the algorithm, however, sometimes 
the result of the calculation must be saved in a 16-bit memory location or must be written to a 16-bit D/A 
converter. Since DSP algorithms process digital signals, it is important that when the 36-bit accumulator 
value is converted to a 16-bit value, saturation is enabled so signals that overflow 16 bits are appropriately 
clipped to the maximum positive or negative value. See Example 3-3. 


Example 3-3. Correctly Reading a Word from an Accumulator to a D/A 


MOVE A,X:D_to_A data; Saturation is enabled 


Note the use of the A accumulator instead of the Al register. Using the A accumulator enables saturation. 


3.2.5 Saving and Restoring Accumulators 


Interrupt service routines offer one example of a time when it is critical that an accumulator be saved and 
restored without being altered in any way. Since an interrupt can occur at any time, the exact usage of an 
accumulator at that instant is unknown, so it cannot be altered by the interrupt service routine without 
adversely affecting any calculation that may have been in progress. In order for an accumulator to be saved 
and restored correctly, it must be done with limiting disabled. This is accomplished through sequentially 
saving and restoring the individual parts of the register, and not the whole register at once. See 

Example 3-4 on page 3-13. 


3-12 DSP56800 Family Manual Q vororora 


Accessing the Accumulator Registers 


Example 3-4. Correct Saving and Restoring of an Accumulator—Word Accesses 


; Saving the A Accumulator to the Stack 
LEA (SP) + Point to first empty location 


7 
MOVE A2,X:(SP)+ ; Save extension register 
MOVE Al1,X:(SP)+ ; Save Fl register 
MOVE AO, xX: (SP) ; Save FO register 
; Restoring the A Accumulator from the Stack 
MOV] X: (SP) -, AO ; Restore FO register 


X: (SP)—-,Al1 ; Restore Fl register 
X: (SP) -,A2 ; Restore extension register 


z 
a 
Ae 


It is important that interrupt service routines do not use the MOVE A,X:(SP)+ instruction when saving to 

the stack. This instruction operates with saturation enabled, and may inadvertently store the value $7FFF 

or $8000 onto the stack, according to the rules employed by the Data Limiter. This could have catastrophic 
effects on any DSP calculation that was in progress. 


3.2.6 Bit-Field Operations on Integers in Accumulators 


When bit-manipulation operations on accumulator registers are performed, as is done for integer 
processing, care must be taken. The bit-manipulation instructions operate as a “Read-Modify- Write” 
sequence, and thus may be affected by limiting during the “Read” portion of this sequence. In order for 
bit-manipulation operations to generate the expected results, limiting must be disabled. To ensure that this 
is the case, the MSP (A1 portion) of an accumulator should be used as the target operand for the ANDC, 
EORC, ORC, NOTC, BFCLR, BFCHG, and BFSET instructions, not the full accumulator. See 

Example 3-5. 


Example 3-5. Bit Manipulation on an Accumulator 


; BFSET using the Al register 

BFSET #SOFO0,Al1 ; Reads Al with saturation disabled 
; Sets bits 11 through 8 and stores back to Al 
; Note: A2 and AO unmodified 


; BESET using the A register 
BFSET #SOFO0,A ; Reads Al with saturation enabled - may limit 
; Sets bits 11 through 8 and stores back to Al 
; A2 is sign extended and AO is cleared 


Since the BFTSTH, BFTSTL, BRCLR, and BRSET instructions only test the accumulator value and do 
not modify it, it is recommended to do these operations on the Al register where no limiting can occur 
when integer processing is performed. 


3.2.7 Converting from 36-Bit Accumulator to 16-Bit Portion 


There are two types of instructions that are useful for converting the 36-bit contents of an accumulator to a 
16-bit value, which can then be stored to memory or used for further computations. This is useful for 
processing word-sized operands (16 bits), since it guarantees that an accumulator contains correct sign 
extension and that the least significant 16 bits are all zeros. The two techniques are shown in Example 3-6 
on page 3-14. 
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Example 3-6. Converting a 36-Bit Accumulator to a 16-Bit Value 


;Converting with No Limiting 


MOVE Al1,A  ;Sign Extend A2, AO set to $0000 
MOVE Al1,B- ;Sign Extend B2, BO set to $0000 
;Converting with Limiting Enabled 
MOVE A,A ;Sign Extend A2, Limit if Required 
MOVE A,B ;Sign Extend B2, Limit if Required 


Where limiting is enabled, as in the second example in Example 3-6, limiting only occurs when the 
extension register is in use. You can determine if the extension register is in use by examining the 
extension bit (E) of the status register. Refer to Section 5.1.8, “Status Register,” on page 5-6. 


3.3 Fractional and Integer Data ALU Arithmetic 


The ability to perform both integer and fractional arithmetic is one of the strengths of the DSP56800 
architecture; there is a need for both types of arithmetic. 


Fractional arithmetic is typically required for computation-intensive algorithms such as digital filters, 
speech coders, vector and array processing, digital control, and other signal-processing tasks. In this mode 
the data is interpreted as fractional values, and the computations are performed interpreting the data as 
fractional. Often, saturation is used when performing calculations in this mode to prevent the severe 
distortion that occurs in an output signal generated from a result where a computation overflows without 
saturation (see Figure 3-14 on page 3-28). Saturation can be selectively enabled or disabled so that 
intermediate calculations can be performed without limiting, and limiting is only done on final results (see 
Example 3-7). 


Example 3-7. Fractional Arithmetic Examples 


0.5 x 0.25 = 0.125 
0.625 + 0.25 = 0.875 
0.125/0.5 = 0.25 
0.5 >> 1=0.25 


Integer arithmetic, on the other hand, is invaluable for controller code, for array indexing and address 
computations, compilers, peripheral setup and handling, bit manipulation, bit-exact algorithms, and other 
general-purpose tasks. Typically, saturation is not used in this mode, but is available if desired. (See 
Example 3-8.) 


Example 3-8. Integer Arithmetic Examples 


4x3=12 

1201 + 79 = 1280 
63/9=7 

100 << 1 = 200 


The main difference between fractional and integer representations is the location of the decimal (or 
binary) point. For fractional arithmetic, the decimal (or binary) point is always located immediately to the 
right of the MSP’s most significant bit; for integer values, it is always located immediately to the right of 
the value’s LSB. Figure 3-8 on page 3-15 shows the location of the decimal point (binary point), bit 
weightings, and operands alignment for different fractional and integer representations supported on the 
DSP56800 architecture. 
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Figure 3-8. Bit Weightings and Operand Alignments 


The representation of numbers allowed on the DSP56800 architecture are as follows: 
¢ Two’s-complement values 
¢ Fractional or integer values 
e Signed or unsigned values 
¢ Word (16-bit), long word (32-bit), or accumulator (36-bit) 


The different representations not only affect the arithmetic operations, but also the condition code 
generation. These numbers can be represented as decimal, hexadecimal, or binary numbers. 


To maintain alignments of the binary point when a word operand is written to an accumulator A or B, the 
operand is written to the most significant accumulator register (Al and B1) and its most significant bit is 
automatically sign extended through the accumulator extension register. The least significant accumulator 
register is automatically cleared. 


Some of the advantages of fractional data representation are as follows: 
¢ The MSP (left half) has the same format as the input data. 
e¢ The LSP (right half) can be rounded into the MSP without shifting or updating the exponent. 
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¢ Conversion to floating-point representation is easier because the industry-standard floating-point 
formats use fractional mantissas. 


¢ Coefficients for most digital filters are derived as fractions by DSP digital-filter design software 
packages. The results from the DSP design tools can be used without the extensive data conversions 
that other formats require. 


e« A significant bit is not lost through sign extension. 


3.3.1 Interpreting Data 


Data in a memory location or register can be interpreted as fractional or integer, depending on the needs of 
auser’s program. Table 3-2 shows how a 16-bit value can be interpreted as either a fractional or integer 
value, depending on the location of the binary point. 


Table 3-2. Interpretation of 16-Bit Data Values 
Binary Hexadecimal Integer Value Fractional Value 
Representation! Representation (decimal) (decimal) 
0.100 0000 0000 0000 $4000 16,384 0.5 
0.010 0000 0000 0000 $2000 8,192 0.25 
0.001 0000 0000 0000 $1000 4,096 0.125 
0.111 0000 0000 0000 $7000 28,672 0.875 
0.000 0000 0000 0000 $0000 0 0.0 
1.100 0000 0000 0000 $C000 - 16,384 -0.5 
1.110 0000 0000 0000 $E000 - 8,192 - 0.25 
1.111 0000 0000 0000 $F000 - 4,096 - 0.125 
1.001 0000 0000 0000 $9000 - 28,672 - 0.875 


1.This corresponds to the location of the binary point when the data is interpreted as fractional. If 
the data is interpreted as integer, the binary point is located immediately to the right of the LSB. 
The following equation shows the relationship between a 16-bit integer and a fractional value: 
Fractional Value = Integer Value / (2)s ) 
There is a similar equation relating 36-bit integers and fractional values: 
Fractional Value = Integer Value / (231) 


Table 3-3 shows how a 36-bit value can be interpreted as either an integer or a fractional value, depending 
on the location of the binary point. 


3-16 


Table 3-3. Interpretation of 36-bit Data Values 
Hexadecimal 36-Bit Integer in 16-Bit Integer in MSp| Fractional 
Representation! asi ea (decimal) ane 
(decimal) (decimal) 
$7 FFFF FFFF 34,359,738,367 - ~ 16.0 
$1 4000 0000 5,368, 709,120 - 2.5 
$0 4000 0000 1,073,741 ,824 16,384 0.5 
$0 2000 0000 536,870,912 8,192 0.25 
$0 0000 0000 0 0 0.0 
$F C000 0000 - 1,073,741 ,824 - 16,384 - 0.5 
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Table 3-3. Interpretation of 36-bit Data Values (Continued) 


Hexadecimal 36-Bit Integer in 16-Bit Integer in MSp| Fractional 
Abprsesntation! Entire Accumulator (decimal) Value 
oP (decimal) (decimal) 
$F E000 0000 - 536,870,912 - 8,192 - 0.25 
$E C000 0000 - 5,368,709,120 : 2.5 
$8 0000 0001 -34,359,738,367 ; -16.0 


1.When the accumulator extension registers are in use, the data contained in the accu- 
mulators cannot be stored exactly in memory or other registers. In these cases the data 
must be limited to the most positive or most negative number consistent with the size 
of the destination. 


3.3.2 Data Formats 


Four types of two’s-complement data formats are supported by the 16-bit DSP core: 
e Signed fractional 
e Unsigned fractional 
e Signed integer 
¢ Unsigned integer 


The ranges for each of these formats, discussed in the following subsections, apply to all data stored in 
memory and to data stored in the data ALU registers. The extension registers associated with the 
accumulators allow word growth so that the most positive signed fractional number that can be represented 
in an accumulator is approximately 16.0 and the most negative signed fractional number is -16.0 as shown 
in Table 3-3. An important factor to consider is that when the accumulator extension registers are in use, 
the data contained in the accumulators cannot be stored exactly in memory or other registers. In these cases 
the data must be limited to the most positive or most negative number consistent with the size of the 
destination and the sign of the accumulator, the MSB of the extension register. 


3.3.2.1 Signed Fractional 
In this format the N bit operand is represented using the 1.[N-1] format (1 sign bit, N-1 fractional bits). 
Signed fractional numbers lie in the following range: 

-1.0<SF<+1.0-21N1 


For words and long-word signed fractions, the most negative number that can be represented is -1.0, whose 
internal representation is $8000 and $80000000, respectively. The most positive word is $7FFF or 1.0 - 
2°!5, and the most positive long word is $7FFFFFFF or 1.0 - pe 


3.3.2.2 Unsigned Fractional 


Unsigned fractional numbers may be thought of as positive only. The unsigned numbers have nearly twice 
the magnitude of a signed number with the same number of bits. Unsigned fractional numbers lie in the 
following range: 


0.0< UF<2.0-21NU 


Examples of unsigned fractional numbers are 0.25, 1.25, and 1.999. The binary word is interpreted as 
having a binary point after the MSB. The most positive 16-bit unsigned number is $FFFF or {1.0 + (1.0 - 2 
ENE) = 1.99996948. The smallest unsigned number is zero ($0000). 
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3.3.2.3 Signed Integer 

This format is used when data is being processed as integers. Using this format, the N-bit operand is 

represented using the N.0 format (N integer bits). Signed integer numbers lie in the following range: 
-2W< sr < p2N-4] 

For words and long-word signed integers the most negative word that can be represented is -32768 


($8000), and the most negative long word is -2147483648 ($80000000). The most positive word is 32767 
($7FFF), and the most positive long word is 2147483647 ($7FFFFFFF). 


3.3.2.4 Unsigned Integer 


Unsigned integer numbers may be thought of as positive only. The unsigned numbers have nearly twice 
the magnitude of a signed number of the same length. Unsigned integer numbers lie in the following range: 


0<Ur< [2-1] 


Examples of unsigned integer numbers are 25, 125, and 1999. The binary word is interpreted as having a 
binary point immediately to the right of the LSB. The most positive, 16-bit, unsigned integer is 65536 
($FFFF). The smallest unsigned number is zero ($0000). 


3.3.3 Addition and Subtraction 


For fractional and integer arithmetic, the operations are performed identically for addition, subtraction, or 
comparing two values. This means that any add, subtract, or compare instruction can be used for both 
fractional and integer values. 


To perform fractional or integer arithmetic operations with word-sized data, the data is loaded into the 
MSP (A1 or B1) of the accumulator as shown in Figure 3-9. 


Before Execution After Execution 
| $0, $0020 $0000 | $0, $0060 $0000 
A2 Al AO A2 Al AO 
XO $0040 X0 $0040 
MOVE #64,X0 Load integer value 64 (S40) into X0 
MOVE #32,A Load integer value 32 ($20) into A Accumulator 


ADD X0,A 
MOVE Al1,X:RESULT 


; Perform Integer Word Addition 
; Save Result (without saturating) to Memory 


Gl 


, 
7 
; (correctly sign extends into A2 and zeros AO) 
, 
, 


AA0045 


Figure 3-9. Word-Sized Integer Addition Example 
Fractional word-sized arithmetic would be performed in a similar manner. For arithmetic operations where 


the destination is a 16-bit register or memory location, the fractional or integer operation is correctly 
calculated and stored in its 16-bit destination. 


3-18 DSP56800 Family Manual Q vororora 


Fractional and Integer Data ALU Arithmetic 


3.3.4 Logical Operations 


For fractional and integer arithmetic, the logical operations (AND, OR, EOR, and bit-manipulation 
instructions) are performed identically. This means that any DSP56800 logical or bit-field instruction can 
be used for both fractional and integer values. Typically, logical operations are only performed on integer 
values, but there is no inherent reason why they cannot be performed on fractional values as well. 


Likewise, shifting can be done on both integer and fractional data values. For both of these, an arithmetic 
left shift of 1 bit corresponds to a multiplication by two. An arithmetic right shift of 1 bit corresponds to a 
division of a signed value by two, and a logical right shift of 1 bit corresponds to a division of an unsigned 
value by two. 


3.3.5 Multiplication 


The multiplication operation is not the same for integer and fractional arithmetic. The result of a fractional 
multiplication differs in a simple manner from the result of an integer multiplication. This difference 
amounts to a |-bit shift of the final result, as illustrated in Figure 3-10. Any binary multiplication of two 
N-bit signed numbers gives a signed result that is 2N-1 bits in length. This 2N-1 bit result must then be 
correctly placed into a field of 2N bits to correctly fit into the on-chip registers. For correct fractional 
multiplication, an extra 0 bit is placed at the LSB to give a 2N bit result. For correct integer multiplication, 
an extra sign bit is placed at the MSB to give a 2N bit result. 


Signed Multiplication: N X N A 2N - 1 Bits 


Integer Fractional 


2 2 


Signed Multiplier Signed Multiplier 
s MSP : LSP : Ss. MSP : LSP 
am @ —_— 2N—1 Product ———> am — 2N—1 Product ——> 
Sign Extension Zero Fill 
2N Bits ———————_ +a 2N Bits —_> 
AA0042 


Figure 3-10. Comparison of Integer and Fractional Multiplication 


The MPY, MAC, MPYR, and MACR instructions perform fractional multiplication and fractional 
multiply-accumulation. The IMPY(16) instruction performs integer multiplication. Section 3.3.5.2, 
“Integer Multiplication,” explains how to perform integer multiplication. 


3.3.5.1 Fractional Multiplication 


Figure 3-11 on page 3-20 shows the multiply-accumulation implementation for fractional arithmetic. The 
multiplication of two, 16-bit, signed, fractional operands gives an intermediate 32-bit, signed, fractional 
result with the LSB always set to zero. This intermediate result is added to one of the 36-bit accumulators. 
If rounding is specified in the MPY or MAC instruction (MACR or MPYR), the intermediate results will 
be rounded to 16 bits before being stored back to the destination accumulator, and the LSP will be set to 
Zero. 
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Input Operand 1 ! Input Operand 2 
gone i 
Input Operands 


————— 16 Bits =< 16 Bits —$—" 


Signed 


Multiplier Result 


MPY Result EXP MSP LSP 


! 
eq 6 Bits =n 


Signed Fractional | 


AA0043 


Figure 3-11. MPY Operation—Fractional Arithmetic 


3.3.5.2 Integer Multiplication 


Two techniques for performing integer multiplication on the DSP core are as follows: 
¢ Using the IMPY(16) instruction to generate a 16-bit result in the MSP of an accumulator 
e Using the MPY and MAC instructions to generate a 36-bit full precision result 

Each technique has its advantages for different types of computations. 


An examination of the instruction set shows that for execution of single precision operations, most often 
the instructions operate on the MSP (bits 31—16) of the accumulator instead of the LSP (bits 15-0). This is 
true for the LSL, LSR, ROL, ROR, NOT, INCW, and DECW instructions and others. Likewise, for the 
parallel MOVE instructions, it is possible to move data to and from the MSP of an accumulator, but this is 
not true for the LSP. Thus, an integer multiplication instruction that places its result in the MSP of an 
accumulator allows for more efficient computing. This is the reason why the IMPY(16) instruction places 
its results in bits 31-16 of an accumulator. The limitation with the IMPY(16) instruction is that the result 
must fit within 16 bits or there is an overflow. 


Figure 3-12 on page 3-21 shows the multiply operation for integer arithmetic. The multiplication of two 
16-bit signed integer operands using the IMPY(16) instruction gives a 16-bit signed integer result that is 
placed in the MSP (A1 or B1) of the accumulator. The corresponding extension register (A2 or B2) is filled 
with sign extension and the LSP (AO or BO) remains unchanged. 
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Input Operand 2 


I 
| Input Operand 1 
Signed Integer 
inputOperands = PL 
'<—___— 16 Bits ____ ly ste Bits —_ 
I 


| | 
| | 

| | 
Signad | <«——_— 16 Bits —____» ! 


Intermediate rs) (ere | 


Multiplier Result 


S Ext. a ae 


Signed Integer 


<——_——_ 16 Bits ————_» 
AA0044 


Figure 3-12. Integer Multiplication (IMPY) 


At other times it is necessary to maintain the full 32-bit precision of an integer multiplication. To obtain 
integer results, an MPY instruction is used, immediately followed by an ASR instruction. The 32-bit long 
integer result is then correctly located into the MSP and LSP of an accumulator with correct sign extension 
in the extension register of the same accumulator (see Example 3-9). 


Example 3-9. Multiplying Two Signed Integer Values with Full Precision 


MPY X0,Y0,A ; Generates correct answer shifted 
1 bit to the left 

Leaves Correct 32-bit Integer 
Result in the A Accumulator 

and the A2 register contains 
correct sign extension 


ASR A 


Ne Ne Ne Ne Ne NS 


When a multiply-accumulate is performed on a set of integer numbers, there is a faster way for generating 
the result than performing an ASR instruction after each multiply. The technique is to use fractional 
multiply-accumulates for the bulk of the computation and to then convert the final result back to integer. 


See Example 3-10. 


Example 3-10. Fast Integer MACs using Fractional Arithmetic 


MOVE X: (RO) +, YO X: (R3) +, X0 
DO #N, LABEL 
MAC X0,Y0,A X: (RO) +, YO X: (R3)+,X0 
LABEL 
ASR A 7 Convert to Integer only after MACs are 


; completed 


3.3.6 Division 


Fractional and integer division of both positive and signed values is supported using the DIV instruction. 
The dividend (numerator) is a 32-bit fractional or 31-bit integer value, and the divisor (denominator) is a 
16-bit fractional or integer value, respectively. See Section 8.4, “Division,” on page 8-13 for a complete 
discussion of division. 
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3.3.7 Unsigned Arithmetic 


Unsigned arithmetic can be performed on the DSP56800 architecture. The addition, subtraction, and 
compare instructions work for both signed and unsigned values, but the condition code computation is 
different. Likewise, there is a difference for unsigned multiplication. 


3.3.7.1. Conditional Branch Instructions for Unsigned Operations 


Unsigned arithmetic is supported on operations such as addition, subtraction, comparison, and logical 
operations using the same ADD, SUB, CMP, and other instructions used for signed computations. The 
operations are performed the same for both representations. The difference lies both in which status bits 
are used in comparing signed and unsigned numbers and in how the data is interpreted, for which see 
Section 3.3.2, “Data Formats.” 


Four additional Bcc instruction variants are provided for branching based on the comparison of two 
unsigned numbers. These variants are: 


¢ HS (High or same)—unsigned greater than or equal to 
e LS (Low or same)—unsigned less than or equal to 

¢« HI (High)—unsigned greater than 

« LO (Low)—unsigned less than 


The variants used for comparing unsigned numbers, HS, LS, HI, and LO, are used in place of GE, LE, GT, 
and LT respectively, which are used for comparing signed numbers. Note that the HS condition is exactly 
the same as the carry clear (CC), and that LO is exactly the same as carry set (CS). 


Unsigned comparisons are enabled when the CC bit in the OMR register is set. When this bit is set, the 
value in the extension register is ignored when generating the C, V, N, and Z condition codes, and the 
condition codes are set using only the 32 LSBs of the result. Typically, this mode is very useful for 
controller and compiled code. 


NOTE: 


The unsigned branch condition variants (HS, LS, HI, and LO) may only be 
used when the CC bit is set in the program controller’s OMR register. If 
this bit is not set, then these condition codes should not be used. 


In cases where it is necessary to maintain all 36 bits of the result and the extension register is required, any 
unsigned numbers must first be converted to signed when loaded into the accumulator using the technique 
in Section 8.1.6, “Unsigned Load of an Accumulator,” on page 8-7. In these cases, the extension register 
will contain the correct value, and since values are now signed, it is possible to use the signed branch 
conditions: GE, LE, GT, or LT. Typically, this mode is more useful for DSP code. 


3.3.7.2 Unsigned Multiplication 


Unsigned multiplications are supported with the MACSU and MPYSU instructions. If only one operand is 
unsigned, then these instructions can be used directly. If both operands are unsigned, an 
unsigned-times-unsigned multiplication is performed using the technique demonstrated in Example 3-11 
on page 3-23. 
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Example 3-11. Multiplying Two Unsigned Fractional Values 


MOVE X:FIRST,X0 ; Get first operand from memory 
ANDC #S7FFF,X0 ; Force first operand to be positive 
MOVE X:SECOND,YO ; Get second operand from memory 
MPYSU X0,Y0,A 
TSTW X:FIRST ; Perform final addition if MSB of first operand was a one 
BGE OVER ; If first operan is less that one, jump to OVER 
MOVE #S0,B 
MOVE YO,B1 ; Move YO to B without sign extension 
ADD B,A 
OVER 
(ASR A) ; Optionally convert to integer result 


3.3.8 Multi-Precision Operations 


The DSP56800 instruction set contains several instructions which simplify extended- and multi-precision 
mathematical operations. By using these instructions, 64-bit and 96-bit calculations can be performed, and 
calculations involving different-sized operands are greatly simplified. 


3.3.8.1 Multi-Precision Addition and Subtraction 


Two instructions, ADC and SBC, assist in performing multi-precision addition (Example 3-12) and 
subtraction (Example 3-13), such as 64-bit or 96-bit operations. 


Example 3-12. 64-Bit Addition 


X:$1:X:$0:Y1:Y0 + A2:A1:A0:B1:BO = A2:A1:A0:B1:BO 
(B2 must contain only sign extension before addition begins; 
that is, bits 35-31 are all 1s or Os) 


MOVE X:$21,B ; Correct sign extension 

MOVE X:$20,B0 

ADD Y,B ; First 32-bit addition 

MOVE X:S0,Y0 ; Get second 32-bit operand from memory 
MOVE X:$1,Y1 

ADC Y,A ; Second 32-bit addition 


Example 3-13. 64-Bit Subtraction 


A2:A1:A0:B1:BO - X:$1:X:$0:Y1:Y0 = A2:A1:A0:B1:BO 
(B2 must contain only sign extension before addition begins; 
that is, bits 35-31 are all 1s or Os) 


MOVE X:$21,B ; Correct sign extension 

MOVE X:$20,B0 

SUB Y,B ; First 32-bit subtraction 

MOVE X:$0,Y0 ; Get second 32-bit operand from memory 
MOVE X:$1,Y1 

SBC Y,A ; Second 32-bit subtraction 


3.3.8.2 Multi-Precision Multiplication 


Two instructions are provided to assist with multi-precision multiplication. When these instructions are 
used, the multiplier accepts one signed and one unsigned two’s-complement operand. The instructions are: 


¢ MPYSU—nmultiplication with one signed and one unsigned operand 
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¢ MACSU—nmultiply-accumulate with one signed and one unsigned operand 


The use of these instructions in multi-precision multiplication is demonstrated in Figure 3-13, with 
corresponding examples shown in Example 3-14, Example 3-15 on page 3-24, and Example 3-16 on 
page 3-25. 


~<«— 16Bits —> 


+a —— 32 Bits —————_"> 


Signed x Unsigned 


Signed x Signed 


XO x Y1 


Sign Ext. 


<_\_\|\_— 48 Bits ———————__» AA0046 
Figure 3-13. Single-Precision Times Double-Precision Signed Multiplication 
Example 3-14. Fractional Single-Precision Times Double-Precision Value—Both Signed 


(5 Icyc, 5 Instruction Words) 


MPYSU X0,Y0,A ; Single Precision times Lower Portion 
MOVE AO,B 


MOVE Al1,A0 ; 16-bit Arithmetic Right Shift 
MOVE A2,A1 ; (note that A2 contains only sign extension) 
MAC X0O,Y1,A ; Single Precision times Upper Portion 


; and added to Previous 


Example 3-15. Integer Single-Precision Times Double-Precision Value—Both Signed 


(7 Icyc, 7 Instruction Words) 


MPYSU X0,Y0,A ; Single Precision times Lower Portion 
MOVE AO,B 
MOVE Al1,A0 ; 16-bit Arithmetic Right Shift 
MOVE A2,A1 ; (note that A2 contains only sign 
; extension) 
MAC X0,Y1,A ; Single Precision x Upper Portion and add to Previous 
ASR A ; Convert result to integer, A2 contains sign extension 
ROR B ; (52-bit shift of A2:A1:A0:B1) 
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Example 3-16. Multiplying Two Fractional Double-Precision Values 


Parameters: 
R1 
R2 


Signed 32x32 => 64 Multiplication Subroutine 


ptr to lowest word of one operand 
ptr to lowest word of one operand 


Ne Ne Ne Ne Ne Ne Ne 


R3 


ptr to where results are stored 


MULT_S32_X_S32 


CLR 


B 


x 


clears 


B2 portion 


; Multiply lwrl * lwr2 and save 


lowest 16-bits of result 


; Operation ;  X0 Y1 YO A 

, ¥ 

MOVE X:(R1),YO ; aS aS lwrl ---—-- 

ANDC #CLRMSB, YO f 0s=S5 oa lwrl’ ----- 

MOVE X: (R2)+,Y1 co lwr2 lwrl’ —---—-—- 

MPYSU YO,Y1,A Pe SSS lwr2 lwrl’ Ilwrl’.s * lwr2.u 

TSTW X: (R1)+ ; check if MSB set in original lwrl value 

BGE CORRECT_RES1 ; perform correction if this was true 

MOVE Y1,Bl1 ; oo lwr2 lwrl’ = ----- 

ADD B,A yo SSS lwr2 lwrl’ lwrl.u * Ilwr2.u 
CORRECT_RES1 

MOVE AO,X: (R3)+ p~ e lwr2 lwrl’ Ilwrl.u * lwr2.u 


; Multiply two cross products and save next 


lowest 16-bits of result 


; Operation ; x0 Y1 YO A 

fa To pice ey py he peed tne Shae mt GS, Th 6 tape Ri nep tase Ot ee 

MOVE Al,X:TMP f (arithmetic 16-bit right shift of 36-bit accum) 
MOVE A2,A poco ee 

MOVE X:TMP, AO poss SOA = prroductl1 >> 16 

MOVE X: (RL) -, XO ; uprl lwr2 lwrl’ A = productl >> 16 

MACSU X0,Y1,A 7 uprl lwr2 lwrl’ Atuprl.s*lwr2.u 

MOVE X: (R1),Y1 7 uprl lwrl lwrl’ Atuprl.s*lwr2.u 

MOVE X: (R2),YO ; uprl lwrl upr2 Atuprl.s*lwr2.u 

MACSU YO,Y1,A 7 uprl lwrl upr2 Atuprl.s*lwr2.utupr2.s*lwrl.u 
MOVE AO, X: (R3)+ *; uprl lwrl upr2 A = result w/ cross prods 


; Multiply uprl * upr2 and save 


; Operation ; x0 Yl YO A 

Fit ee eg eee Wpo ee ee, Pres, we) gem ee ree tee ye 

MOVE Al,X:TMP 7 (arithmetic 16-bit right shift of 36-bit accum) 
MOVE A2,A 7; uprl lwrl upr2 = ----— 

MOVE X: TMP, AO 7 uprl lwrl upr2 A = result >> 16 

MAC X0,Y0,A ; uprl lwrl upr2 A+ uprl.s * upr2.s 

MOVE AO,X:(R3)+ jf -- nee et eee 

MOVE AL, x! (R3)4°  ¢. S55 =a at as x a 

RTS 


; The corresponding algorithm for integer multiplication of 32-bit values 
; would be the same as for fractional with the addition of a final arithmetic 


; right shift of the 64-bit result. 
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3.4 Saturation and Data Limiting 


DSP algorithms are sometimes capable of calculating values larger than the data precision of the machine 
when processing real data streams. Normally, a processor would allow the value to overflow when this 
occurred, but this creates problems when processing real-time signals. The solution is saturation, a 
technique whereby values that exceed the machine data precision are “clipped,” or converted to the 
maximum value of the same sign that fits within the given data precision. 


Saturation is especially important when data is running through a digital filter whose output goes to a 
digital-to-analog converter (DAC), since it “clips” the output data instead of allowing arithmetic overflow. 
Without saturation, the output data may incorrectly switch from a large positive number to a large negative 
value, which can cause problems for DAC outputs in embedded applications. 


The DSP56800 architecture supports optional saturation of results through two limiters found within the 
data ALU: 


¢ the Data Limiter 
¢ the MAC Output Limiter 


The Data Limiter saturates values when data is moved out of an accumulator with a MOVE instruction or 
parallel move. The MAC Output Limiter saturates the output of the data ALU’s MAC unit. 


3.4.1 Data Limiter 


The data limiter protects against overflow by selectively limiting when reading an accumulator register as 
a source operand in a MOVE instruction. When a MOVE instruction specifies an accumulator (F) as a 
source, and if the contents of the selected source accumulator can be represented in the destination operand 
size without overflow (that is, the accumulator extension register not in use), the data limiter is enabled but 
does not saturate, and the register contents are placed onto the CGDB unmodified. If a MOVE instruction 
is used and the contents of the selected source accumulator cannot be represented without overflow in the 
destination operand size, the data limiter will substitute a “limited” data value onto the CGDB that has 
maximum magnitude and the same sign as the source accumulator, as shown in Table 3-4 on page 3-27. 


The FO portion of an accumulator is ignored by the data limiter. 
Consider a simple example, shown in Example 3-17. 


Example 3-17. Demonstrating the Data Limiter—Positive Saturation 


MOVE #S7FFC,A ; Initialize A = $0:7FFC:0000 

INC A ; A = $0:7FFD:0000 

MOVE A,X: (RO)+ ; Write S7FFD to memory (limiter enabled) 
INC A ; A = $0:7FFE:0000 

MOVE A,X: (RO)+ ; Write S7FFE to memory (limiter enabled) 
INC A ; A = $0:7FFF:0000 

MOVE A,X: (RO)+ ; Write S7FFF to memory (limiter enabled) 
INC A ; A = $0:8000:0000 <=== Overflows 16-bits 
MOVE A,X: (RO)+ ; Write S7FFF to memory (limiter saturates) 
INC A ; A = $0:8001:0000 

MOVE A,X: (RO)+ ; Write S7FFF to memory (limiter saturates) 
INC A ; A = $0:8002:0000 

MOVE A,X: (RO)+ ; Write S7FFF to memory (limiter saturates) 
MOVE Al1,X: (RO)+ ; Write $8002 to memory (limiter disabled) 
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Once the accumulator increments to $8000 in Example 3-17, the positive result can no longer be written to 
a 16-bit memory location without overflow. So, instead of writing an overflowed value to memory, the 
value of the most positive 16-bit number, $7fff, is written instead by the data limiter block. Note that the 
data limiter block does not affect the accumulator; it only affects the value written to memory. In the last 
instruction, the limiter is disabled because the register is specified as Al. 


Consider a second example, shown in Example 3-18 on page 3-27. 


Example 3-18. Demonstrating the Data Limiter — Negative Saturation 


MOVE #$8003,A ; Initialize A = SF:8003:0000 

DEC A ; A = SF:8002:0000 

MOVE A,X: (RO)+ ; Write $8002 to memory (limiter enabled) 
DEC A ; A = SF:8001:0000 

MOVE A,X: (RO)+ ; Write $8001 to memory (limiter enabled) 
DEC A ; A = SF:8000:0000 

MOVE A,X: (RO)+ ; Write $8000 to memory (limiter enabled) 
DEC A ; A = SF:7FFF:0000 <=== Overflows 16-bits 
MOVE A,X: (RO)+ ; Write $8000 to memory (limiter saturates) 
DEC A ; A = SF:7FFE:0000 

MOVE A,X: (RO)+ ; Write $8000 to memory (limiter saturates) 
DEC A ; A= SF: TEED: 0000 

MOVE A,X: (RO)+ ; Write $8000 to memory (limiter saturates) 
MOVE Al1,X: (RO)+ ; Write S7FFD to memory (limiter disabled) 


Once the accumulator decrements to $7FFF in Example 3-18, the negative result can no longer fit into a 
16-bit memory location without overflow. So, instead of writing an overflowed value to memory, the value 
of the most negative 16-bit number, $8000, is written instead by the data limiter block. 


Test logic exists in the extension portion of each accumulator register to support the operation of the 
limiter circuit; the logic detects overflows so that the limiter can substitute one of two constants to 
minimize errors due to overflow. This process is called “saturation arithmetic.” When limiting does occur, 
a flag is set and latched in the status register. The value of the accumulator is not changed. 


Table 3-4. Saturation by the Limiter Using the MOVE Instruction 


Extension mits tise luselected MSB of F2 Output of Limiter onto the CGDB Bus 
accumulator? 
No n/a Same as Input—Unmodified MSP 
Yes 0 $7FFF—Maximum Positive Value 
Yes 1 $8000—Maximum Negative Value 


It is possible to bypass this limiting feature when reading an accumulator by reading it out through its 
individual portions. 


Figure 3-14 on page 3-28 demonstrates the importance of limiting. Consider the A accumulator with the 
following 36-bit value to be read to a 16-bit destination: 


0000 1.000 0000 0000 0000 0000 0000 0000 0000 (in binary) 
(+ 1.0 in fractional decimal, $0 8000 0000 in hexadecimal) 


If this accumulator is read without the limiting enabled by a MOVE A1,X0 instruction, the 16-bit XO 
register after the MOVE instruction would contain the following, assuming signed fractional arithmetic: 


1.000 0000 0000 0000(- 1.0 fractional decimal, $8000 in hexadecimal) 
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This is clearly in error because the value -1.0 in the XO register greatly differs from the value of +1.0 in the 
source accumulator. In this case, overflow has occurred. To minimize the error due to overflow, it is 
preferable to write the maximum (“‘limited”’) value the destination can assume. In this example, the limited 
value would be: 


0.111 1111 1111 111104 0.999969 fractional decimal, $7FFF in hexadecimal) 


This is clearly closer to the original value, +1.0, than -1.0 is, and thus introduces less error. Saturation is 
equally applicable to both integer and fractional arithmetic. 


Thus, saturation arithmetic can have a large effect in moving from register Al to register XO. The 
instruction MOVE A1,X0 performs a move without limiting, and the instruction MOVE A,XO0 performs a 
move of the same 16 bits with limiting enabled. The magnitude of the error without limiting is 2.0; with 
limiting it is 0.000031. 


Without Limiting—MOVE A1,X0 With Limiting—MOVE A,X0 


IERRORI = 2.0 15 0 IERRORI = .000031 
*Limiting automatically occurs when the 36-bit operands A and B are read with a MOVE instruction. Note that the 


contents of the original accumulator are not changed. 


Figure 3-14. Example of Saturation Arithmetic 


3.4.2 MAC Output Limiter 


The MAC output limiter optionally saturates or limits results calculated by data ALU arithmetic operations 
such as multiply, add, increment, round, and so on. 


The MAC Output Limiter can be enabled by setting the SA bit in the OMR register. See Section 5.1.9.3, 
“Saturation (SA)—Bit 4,” on page 5-11. 


Consider a simple example, shown in Example 3-19. 


Example 3-19. Demonsirating the MAC Output Limiter 


BFSET #$0010,OMR ; Set SA bit—enables MAC Output Limiter 
MOVE #S7FFC,A ; Initialize A = $0:7FFC:0000 

NOP 

INC A ; A = $0:7FFD:0000 

INC A ; A = $0:7FFE:0000 

INC A ; A = S$0:7FFF:0000 

INC A ; A= SO:7FFF:FFFF <=== Saturates to 16-bits! 
INC A ; A= SO:7FFF:FFFF <=== Saturates to 16-bits! 
ADD #9,A ; A= SO:7FFF:FFFF <=== Saturates to 16-bits! 
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Once the accumulator increments to $7FFF in Example 3-19, the saturation logic in the MAC Output 
limiter prevents it from growing larger because it can no longer fit into a 16-bit memory location without 
overflow. So instead of writing an overflowed value to back to the A accumulator, the value of the most 
positive 32-bit number, $7FFF:FFFF, is written instead as the arithmetic result. 


The saturation logic operates by checking 3 bits of the 36-bit result out of the MAC unit: EXT[3], EXT[O], 
and MSP[15]. When the SA bit is set, these 3 bits determine if saturation is performed on the MAC unit’s 
output and whether to saturate to the maximum positive value ($7FFF:FFFF) or the maximum negative 
value ($8000:0000), as shown in Table 3-5. 


Table 3-5. MAC Unit Outputs with Saturation Enabled 


EXT[3] EXT[0] MSP[15] Result Stored in Accumulator 

0 0 0 Result out of MAC Array with no limiting 
occurring 

0 0 1 $0:7FFF:FFFF 

0 1 0 $0:7FFF:FFFF 

0 1 1 $0:7FFF:FFFF 

1 0 0 $F:8000:0000 

1 0 1 $F:8000:0000 

1 1 0 $F:8000:0000 

1 1 1 Result out of MAC Array with no limiting 
occurring 


The MAC Output Limiter not only affects the results calculated by the instruction, but can also affect 
condition code computation as well. See Appendix A.4.2, “Effects of the Operating Mode Register’s SA 
Bit,” on page A-11 for more information. 


3.4.3 Instructions Not Affected by the MAC Output Limiter 


The MAC Output Limiter is always disabled (even if the SA bit is set) when the following instructions are 
being executed: 


e ASLL, ASRR, LSRR 

e ASRAC, LSRAC 

¢ IMPY 

¢ MPYSU, MACSU 

e AND, OR, EOR 

e LSL, LSR, ROL, ROR, NOT 
e TST 


The CMP is not affected by the OMR’s SA bit except for the case when the first operand is not a register 
(that is, it is a memory location or an immediate value) and the second operand is the XO, YO, or Y1 
register. In this particular case, the U bit calculation is affected by the SA bit. No other bits are affected by 
the SA bit for the CMP instruction. 
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Also, the MAC Output Limiter only affects operations performed in the data ALU. It has no effect on 
instructions executed in other blocks of the core, such as the following: 


e Bit Manipulation Instructions (Table 6-29 and Table 6-30 on page 6-26) 
¢ Move instructions (Table 6-17 through Table 6-20) 

¢ Looping instructions (Table 6-32 on page 6-27) 

¢ Change of flow instructions (Table 6-31 on page 6-27) 

¢ Control instructions (Table 6-33 on page 6-28) 


NOTE: 


The SA bit affects the TFR instruction when it is set, optionally limiting 
data as it is transferred from one accumulator to another. 


3.5 Rounding 


The DSP56800 provides three instructions that can perform rounding—RND, MACR, and MPYR. The 
RND instruction simply rounds a value in the accumulator register specified by the instruction, whereas 
the MPYR or MACR instructions round the result calculated by the instruction in the MAC array. Each 
rounding instruction rounds the result to a single-precision value so the value can be stored in memory or 
in a 16-bit register. In addition, for instructions where the destination is one of the two accumulators, the 
LSP of the destination accumulator (AO or BO) is set to $0000. 


The DSP core implements two types of rounding: convergent rounding and two’s-complement rounding. 
For the DSP56800, the rounding point is between bits 16 and 15 of a 36-bit value; for the A accumulator, it 
is between the Al register’s LSB and the AO register’s MSB. The usual rounding method rounds up any 
value above one-half (that is, LSP > $8000) and rounds down any value below one-half (that is, LSP < 
$8000). The question arises as to which way the number one-half (LSP = $8000) should be rounded. If it is 
always rounded one way, the results will eventually be biased in that direction. Convergent rounding 
solves the problem by rounding down if the number is even (bit 16 equals zero) and rounding up if the 
number is odd (bit 16 equals one), whereas two’s-complement rounding always rounds this number up. 
The type of rounding is selected by the rounding bit (R) of the operating mode register (OMR) in the 
program controller. 


3.5.1 Convergent Rounding 


This is the default rounding mode. This rounding is also called “round to nearest even number.” For most 
values, this mode rounds identically to two’s-complement rounding; it only differs for the case where the 
least significant 16 bits is exactly $8000. For this case, convergent rounding prevents any introduction of a 
bias by rounding down if the number is even (bit 16 equals zero) and rounding up if the rounding is odd 
(bit 16 equals one). Figure 3-15 on page 3-31 shows the four possible cases for rounding a number in the A 
or B accumulator. 
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Case I: If AO < $8000 (1/2), then round down (add nothing) 


Before Rounding After Rounding 


A2 Al AO A2 Al AO* 


XX..XXIXXXK...XXXO0100{011XXX....XXKX XX..XX|IXXX...XXX0100j000 


35 32 31 16 15 0 35 32 31 16 15 0 


Case Il: If AO > $8000 (1/2), then round up (Add 1 To A1) 


Before Rounding After Rounding 


A2 Al AO A2 Al AO* 


XX..XXIXXXK...XXXO100{1110XX....XXX XX..XX|IXXX...XXX0101]000 


35 32 31 16 15 0 35 32 31 16 15 0 


Case Ill: If Ao = $8000 (1/2), and the LSB of A1 = 0 (even),then round down (add nothing) 


Before Rounding After Rounding 
0 
A2 Al AO* 
000......... 000 


35 32 31 16 15 0 35 32 31 1615 


Case IV: If AO = $8000 (1/2), and the LSB = 1 (odd), then round up (add 1 To A1) 


Before Rounding After Rounding 
1 
A2 Al AO* 
000......... 000 


35 32 31 16 15 0 35 32 31 16 15 


*AO is always clear; performed during RND, MPYR, and MACR 
AA0048 


Figure 3-15. Convergent Rounding 


3.5.2 Two’s-Complement Rounding 


When this type of rounding is selected by setting the rounding bit in the OMR, one is added to the bit to the 
right of the rounding point (bit 15 of AO) before the bit truncation during a rounding operation. Figure 3-16 
shows the two possible cases. 
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Case I: Ao < 0.5 ($8000), then round down 


Before Rounding After Rounding 
A2 Al AO A2 Al AQ* 
36 31 15 0 36 31 15 0 


Case II: AO >= 0.5 ($8000), then round up 


Before Rounding After Rounding 
A2 Al AO A2 Al AO* 

36 31 15 0 36 31 15 0 

*AO is always clear; performed during RND, MPYR, MACR 


AA0050 


Figure 3-16. Two’s-Complement Rounding 


Once the rounding bit has been programmed in the OMR register, there is a delay of one instruction cycle 
before the new rounding mode becomes active. 
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3.6 Condition Code Generation 


The DSP core supports many different arithmetic instructions for both word and long-word operations. 
The flexible nature of the instruction set means that condition codes must also be generated correctly for 
the different combinations allowed. There are three questions to consider when condition codes are 
generated for an instruction: 


e Is the arithmetic operation’s destination an accumulator, or a 16-bit register or memory location? 
¢ Does the instruction operate on the whole accumulator or only on the upper portion? 
¢ Is the CC bit set in the program controller’s OMR register? 


The CC bit in the OMR register allows condition codes to be generated without an examination of the 
contents of the extension register. This sets up a computing environment where there is effectively no 
extension register because its contents are ignored. Typically, the extension register is most useful in DSP 
operations. For the case of general-purpose computing, the CC bit is often set when the program is not 
performing DSP tasks. However, it is possible to execute any instruction with the CC bit set or cleared, 
except for instructions that use one of the unsigned condition codes (HS, LS, HI, or LO). 


This section covers different aspects of condition code generation for the different instructions and 
configurations on the DSP core. Note that the L, E, and U bits are computed the same regardless of the size 
of the destination or the value of the CC bit: 


¢ Lis set if overflow occurs or limiting occurs in a parallel move. 
¢ Eis set if the extension register is in use (that is, if bits 35-31 are not all the same). 


¢ Uris set according to the standard definition of the U bit. 


3.6.1 36-Bit Destinations—CC Bit Cleared 


Most arithmetic instructions generate a result for a 36-bit accumulator. When condition codes are being 
generated for this case and the CC bit is cleared, condition codes are generated using all 36 bits of the 
accumulator. Examples of instructions in this category are ADC, ADD, ASL, CMP, MAC, MACR, MPY, 
MPYR, NEG, NORM, and RND. 


The condition codes for 36-bit destinations are computed as follows: 


¢ Nis set if bit 35 of the corresponding accumulator is set except during saturation. During a 
saturation condition, the V (overflow) bit is set and the N bit is not set. 


¢ Zis set if bits 35-0 of the corresponding accumulator are all cleared. 
e Vis set if overflow has occurred in the 36-bit result. 


¢ Cis set if a carry (borrow) has occurred out of bit 35 of the result. 
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3.6.2 36-Bit Destinations—CC Bit Set 


Most arithmetic instructions generate a result for a 36-bit accumulator. When condition codes are being 
generated for this case and the CC bit is set, condition codes are generated using only the 32 bits of the 
accumulator located in the MSP and LSP. There may be values in the extension registers, but the contents 
of the extension register are ignored. It is effectively the same as if there is no extension register. Examples 
of instructions in this category are ADC, ADD, ASL, CMP, MAC, MACR, MPY, MPYR, NEG, NORM, 
and RND. 


The condition codes for 32-bit destinations (CC equals one) are computed as follows: 
¢ Nis set if bit 31 of the corresponding accumulator is set. 
¢ Zis set if bits 31-0 of the corresponding accumulator are all cleared. 
e Vis set if overflow has occurred in the 32-bit result. 


¢ Cis set if a carry (borrow) has occurred out of bit 31 of the result. 


3.6.3 20-Bit Destinations—CC Bit Cleared 


Two arithmetic instructions generate a result for the upper two portions of an accumulator, the MSP and 
the extension register, leaving the LSP of the accumulator unchanged. When condition codes are being 
generated for this case and the CC bit is cleared, condition codes are generated using the 20 bits in the 
upper two portions of the accumulator. The two instructions in this category are DECW and INCW. 


The condition codes for DECW and INCW (CC equals zero) are computed as follows: 


¢ Nis set if bit 35 of the corresponding accumulator is set except during saturation. During a 
saturation condition, the V (overflow) bit is set and the N bit is not set. 


¢ Zis set if bits 35-16 of the corresponding accumulator are all cleared. 
¢ Vis set if overflow has occurred in the 20-bit result. 


¢ Cis set if a carry (borrow) has occurred out of bit 35 of the result. 


3.6.4 20-Bit Destinations—CC Bit Set 


Two arithmetic instructions generate a result for the upper two portions of an accumulator, the MSP and 
the extension register, leaving the LSP of the accumulator unchanged. When condition codes are being 
generated for this case and the CC bit is set, the bits in the extension register and the LSP of the 
accumulator are not used to calculate condition codes. The two instructions in this category are DECW and 
INCW. 


The condition codes for 16-bit destinations (CC equals one) are computed as follows: 
¢ Nis set if bit 31 of the corresponding accumulator is set. 
¢ Zis set if bits 31-16 of the corresponding accumulator are all cleared. 
e Vis set if overflow has occurred in the 16-bit result. 


¢ Cis set if a carry (borrow) has occurred out of bit 31 of the result. 
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3.6.5 16-Bit Destinations 


Some arithmetic instructions can generate a result for a 36-bit accumulator or a 16-bit destination such as a 
register or memory location. When condition codes for a 16-bit destination are being generated, the CC bit 
is ignored and condition codes are generated using the 16 bits of the result. Instructions in this category are 
ADD, CMP, SUB, DECW, INCW, MAC, MACR, MPY, MPYR, ASR, and ASL. 


The condition codes for 16-bit destinations are computed as follows: 
¢ Nis setif bit 15 of the result is set. 
e Zis set if bits 15—0 of the result are all cleared. 
e Vis set if overflow has occurred in the 16-bit result. 
¢ Cis set if a carry (borrow) has occurred out of bit 15 of the result. 


Other instructions only generate results for a 16-bit destination such as the logical instructions. When 
condition codes are being generated for this case, the CC bit is ignored and condition codes are generated 
using the 16 bits of the result. Instructions in this category are AND, EOR, LSL, LSR, NOT, OR, ROL, 
and ROR. The rules for condition code generation are presented for the cases where the destination is a 
16-bit register or 16 bits of a 36-bit accumulator. 


The condition codes for logical instructions with 16-bit registers as destinations are computed as follows: 
¢ Nis set if bit 15 of the corresponding register is set. 
¢ Zis set if bits 15—0 of the corresponding register are all cleared. 
¢« Vis always cleared. 
¢ C—Computation dependent on instruction. 


The condition codes for logical instructions with 36-bit accumulators as destinations are computed as 
follows: 


¢ Nis set if bit 31 of the corresponding accumulator is set. 
¢ Zis set if bits 31-16 of the corresponding accumulator are all cleared. 
¢« Vis always cleared. 


¢ C—Computation dependent on instruction. 


3.6.6 Special Instruction Types 


Some instructions do not follow the preceding rules for condition code generation, and must be considered 
separately. Examples of instructions in this category are the logical and bit-field instructions (ANDC, 
EORC, NOTC, ORC, BFCHG, BFCLR, BFSET, BFTSTL, BFTSTH, BRCLR, and BRSET), the CLR 
instruction, the IMPY(16) instruction, the multi-bit shifting instructions (ASLL, ASRR, LSLL, LSRR, 
ASRAC, and LSRAC), and the DIV instruction. 


The bit-field instructions only affect the C and the L bits. The CLR instruction only generates condition 
codes when clearing an accumulator. The condition codes are not modified when clearing any other 
register. Some of the condition codes are not defined after executing the IMPY(16) and multi-bit shifting 
instructions. The DIV instruction only affects a subset of all the condition codes. See Appendix A.4, 
“Condition Code Computation,” on page A-6 for details on the condition code computation for each of 
these instructions. 
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3.6.7 TST and TSTW Instructions 


There are two instructions, TST and TSTW, that are useful for checking the value in a register or memory 
location. 


The condition codes for the TST instruction (on a 36-bit accumulator) with CC equal to zero are computed 
as follows: 


L is set if limiting occurs in a parallel move. 


E is set if the extension register is in use—that is, if bits 35-31 are not all the same. 


U is set according to the standard definition of the U bit. 

N is set if bit 35 of the corresponding accumulator is set except during saturation. 
Z is set if bits 35-0 of the corresponding accumulator are all cleared. 

V is always cleared. 


C is always cleared. 


The condition codes for the TST instruction (on a 36-bit accumulator) with CC equal to one are computed 
as follows: 


L is set if limiting occurs in a parallel move. 

E is set if the extension register is in use, that is, if bits 35-31 are not all the same. 
U is set according to the standard definition of the U bit. 

N is set if bit 31 of the corresponding accumulator is set. 

Z is set if bits 31-0 of the corresponding accumulator are all cleared. 

V is always cleared. 


C is always cleared. 


The condition codes for the TSTW instruction (on a 16-bit value) are computed as follows: 


N is set if the MSB of the 16-bit value is set. 
Z is set if all 16 bits of the 16-bit value are cleared. 
V is always cleared. 


C is always cleared. 


3.6.8 Unsigned Arithmetic 


When arithmetic on unsigned operands is being performed, the condition codes used to compare two 
values differ from those used for signed arithmetic. See Section 3.3.7, “Unsigned Arithmetic,” for a 
discussion of condition code usage for unsigned arithmetic. 
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Chapter 4 
Address Generation Unit 


This chapter describes the architecture and the operation of the address generation unit (AGU). The 
address generation unit is the block where all address calculations are performed. It contains two 
arithmetic units—a modulo arithmetic unit for complex address calculations and an 
incrementer/decrementer for simple calculations. The modulo arithmetic unit can be used to calculate 
addresses in a modulo fashion, automatically wrapping around when necessary. A set of pointer registers, 
special-purpose registers, and multiple buses within the unit allow up to two address updates or a memory 
transfer to or from the AGU in a single cycle. 


The capabilities of the address generation unit include the following operations: 
e Provide one address to X data memory on the XAB1 bus 
¢ Post-update an address after providing the original address value on XAB1 bus 
¢ Calculate an effective address which is then provided on the XAB1 bus 


¢ Provide two addresses to X data memory on the XAB1 and XAB2 buses and post-update both 
addresses 


e Provide one address to program memory for program memory data accesses and post-update the 
address 


e Increment or decrement a counter during normalization operations 
¢ Provide a conditional register move (Tcc instruction) 


Note that in the cases where the address generation unit is generating one or two addresses to access X data 
memory, the program controller generates a second or third address used to concurrently fetch the next 
instruction. 


The AGU provides many different addressing modes, which include the following: 


¢ Indirect addressing with no update ¢ Immediate data 

¢ Indirect addressing with post-increment ¢« Immediate short data 

¢ Indirect addressing with post-decrement e Absolute addressing 

¢ Indirect addressing with post-update by a ¢ Absolute short addressing 
register 


¢ Peripheral short addressing 
¢ Indirect addressing with index by a 16-bit 


Register direct 
offset 


: ; oe L Implicit 
¢ Indirect addressing with index by a 6-bit 
offset 


¢ Indirect addressing with index by a register 
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This chapter covers the architecture and programming model of the address generation unit, its addressing 
modes, and a discussion of the linear and modulo arithmetic capabilities of this unit. It concludes with a 
discussion of pipeline dependencies related to the address generation unit. 


4.1 Architecture and Programming Model 


The major components of the address generation unit are as follows: 
¢ Four address registers (RO-R3) 
¢ A stack pointer register (SP) 
e« An offset register (N) 
« A modifier register (MO1) 
« A modulo arithmetic unit 
e An incrementer/decrementer unit 


The AGU uses integer arithmetic to perform the effective address calculations necessary to address data 
operands in memory. The AGU also contains the registers used to generate the addresses. It implements 
linear and modulo arithmetic and operates in parallel with other chip resources to minimize 
address-generation overhead. 


Two ALUs are present within the AGU: the modulo arithmetic unit and the incrementer/decrementer unit. 
The two arithmetic units can generate up to two 16-bit addresses and two address updates every instruction 
cycle: one for XAB1 and one for XAB2 for instructions performing two parallel memory reads. The AGU 
can directly address 65,536 locations on XAB1 and 65,536 locations on the PAB. The AGU can directly 
address up to 65,536 locations on XAB2, but can only generate addresses to on-chip memory. The two 
ALUs work with the data memory to access up to two locations and provide two operands to the data ALU 
in a single cycle. The primary operand is addressed with the XAB1, and the second operand is addressed 
with the XAB2. The data memory, in turn, places its data on the core global data bus (CGDB) and the 
second external data bus (XDB2), respectively (see Figure 4-1 on page 4-3). See Section 6.1, “Introduction 
to Moves and Parallel Moves,” on page 6-1 for more discussion on parallel memory moves. 
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CGDB(15:0) 


Arithmetic 
Unit 


PAB(15:0) XAB1(15:0) XAB2(15:0) anni 


Figure 4-1. Address Generation Unit Block Diagram 


All four address pointer registers and the SP are used in generating addresses in the register indirect 
addressing modes. The offset register can be used by all four address pointer registers and the SP, whereas 
the modulo register can be used by the RO or by both the RO and R1 pointer registers. 


Whereas all the address pointer registers and the SP can be used in many addressing modes, there are some 
instructions that only work with a specific address pointer register. These cases are presented in Table 4-5 
on page 4-9. 

The address generation unit is connected to four major buses: CGDB, XAB1, XAB2, and PAB. The 
CGDB is used to read or write any of the address generation unit registers. The XAB1 and XAB2 provide 


a primary and secondary address, respectively, to the X data memory, and the PAB provides the address 
when accessing the program memory. 


A block diagram of the address generation unit is shown in Figure 4-1, and its corresponding programming 
model is shown in Figure 4-2. The blocks and registers are explained in the following subsections. 


15 0 
M01 
Pointer Offset Modifier 
Registers Register Register 


AA0015 


Figure 4-2. Address Generation Unit Programming Model 
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4.1.1 Address Registers (RO-R3) 


The address register file consists of four 16-bit registers RO-R3 (Rn) that usually contain addresses used as 
pointers to memory. Each register may be read or written by the CGDB. High speed access to the XAB1, 
XAB2, and PAB buses is required to allow maximum access time for the internal and external X data 
memory and program memory. Each address register may be used as input for the modulo arithmetic unit 
for a register update calculation. Each register may be written by the output of the modulo arithmetic unit. 


The R3 register may be used as input to a separate incrementer/decrementer unit for an independent 
register update calculation. This unit is used in the case of any instruction that performs two data memory 
reads in its parallel move field. For instructions where two reads are performed from the X data memory, 
the second read using the R3 pointer must always access on-chip memory. 


NOTE: 


Due to pipelining, if an address register (Rn, SP, or M01) is changed with 
a MOVE or bit-field instruction, the new contents will not be available for 
use as a pointer until the second following instruction. If the SP is changed, 
no LEA or POP instructions are permitted until the second following 
instruction. 


4.1.2 Stack Pointer Register (SP) 


The stack pointer register (SP) is a single 16-bit register that is used implicitly in all PUSH instruction 
macros and POP instructions. The SP is used explicitly for memory references when used with the 
address-register-indirect modes. It is post-decremented on all POPs from the software stack. The SP 
register may be read or written by the CGDB. 


NOTE: 


This register must be initialized explicitly by the programmer after coming 
out of reset. 


Due to pipelining, if an address register (Rn, SP, or M01) is changed with 
a MOVE or bit-field instruction, the new contents will not be available for 
use as a pointer until the second following instruction. If the SP is changed, 
no LEA or POP instructions are permitted until the second following 
instruction. 


4.1.3 Offset Register (N) 


The offset register (N) usually contains offset values used to update address pointers. This single register 
can be used to update or index with any of the address registers (RO-R3, SP). This offset register may be 
read or written by the CGDB. The offset register is used as input to the modulo arithmetic unit. It is often 
used for array indexing or indexing into a table, as discussed in Section 8.7, “Array Indexes,” on page 
8-26. 
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NOTE: 


If the N address register is changed with a MOVE instruction, this 
register’s contents will be available for use on the immediately following 
instruction. In this case the instruction that writes the N address register 
will be stretched one additional instruction cycle. This is true for the case 
when the N register is used by the immediately following instruction; if N 
is not used, then the instruction is not stretched an additional cycle. If the 
N address register is changed with a bit-field instruction, the new contents 
will not be available for use until the second following instruction. 


4.1.4 Modifier Register (M01) 


The modifier register (M01) specifies whether linear or modulo arithmetic is used when calculating a new 
address and may be read or written by the CGDB. This modifier register is automatically read when the RO 
address register is used in an address calculation and can optionally be used also when R1 is used. This 
register has no effect on address calculations done with the R2, R3, or SP registers. It is used as input to the 
modulo arithmetic unit. This modifier register is preset during a processor reset to $FFFF (linear 
arithmetic). 


NOTE: 


Due to pipelining, if an address register (Rn, SP, or M01) is changed with 
a MOVE or bit-field instruction, the new contents will not be available for 
use as a pointer until the second following instruction. If the SP is changed, 
no LEA or POP instructions are permitted until the following instruction. 


4.1.5 Modulo Arithmetic Unit 


The modulo arithmetic unit can update one address register or the SP during one instruction cycle. It is 
capable of performing linear and modulo arithmetic, as described in Section 4.3, “AGU Address 
Arithmetic.” The contents of the modifier register specifies the type of arithmetic to be performed in an 
address register update calculation. The modifier value is decoded in the modulo arithmetic unit and 
affects the unit’s operation. The modulo arithmetic unit’s operation is data-dependent and requires 
execution cycle decoding of the selected modifier register contents. Note that the modulo capability is only 
allowed for RO or R1 updates; it is not allowed for R2, R3, or SP updates. 


The modulo arithmetic unit first calculates the result of linear arithmetic (for example, Rn+1, Rn-1, Rn+N) 
which is selected as the modulo arithmetic unit’s output for linear arithmetic. For modulo arithmetic, the 
modulo arithmetic unit will perform the function (Rn+N) modulo (M01+1), where N can be 1, -1, or the 
contents of the offset register N. If the modulo operation requires “wraparound” for modulo arithmetic, the 
summed output of the modulo adder will give the correct, updated address register value; otherwise, if 
wraparound is not necessary, the linear arithmetic calculation gives the correct result. 


4.1.6 Incrementer/Decrementer Unit 


The incrementer/decrementer unit is used for address-update calculations during dual data-memory read 
instructions. It is used either to increment or decrement the R3 register. This adder performs only linear 
arithmetic; it performs no modulo arithmetic. 
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4.2 Addressing Modes 


The DSP56800 instruction set contains a full set of operand addressing modes, optimized for 
high-performance signal processing as well as efficient controller code. All address calculations are 
performed in the address generation unit to minimize execution time. 


Addressing modes specify where the operand or operands for an instruction can be found—whether an 
immediate value, located in a register, or in memory—and provide the exact address of the operand(s). 


The addressing modes are grouped into four categories: 
¢ Register direct—directly references the processor registers as operands 


e Address register indirect—uses an address register as a pointer to reference a location in memory 
as an operand 


¢ Immediate—the operand is contained as a value within the instruction itself 


e« Absolute—uses an address contained within the instruction to reference a location in memory as an 
operand 


An effective address in an instruction will specify an addressing mode (that is, where the operands can be 
found), and for some addressing modes the effective address will further specify an address register that 
points to a location in memory, how the address is calculated, and how the register is updated. 


These addressing modes are referred to extensively in Section 6.5.2, “LSLL Alias,” on page 6-13. 


Several of the examples in the following sections demonstrate the use of assembler forcing operators. 
These can be used in an instruction to force a desired addressing mode, as shown in Table 4-1. 


Table 4-1. Addressing Mode Forcing Operators 


Desired Action Forcing Operator Syntax Example 
Force immediate short data #<XX #<$07 
Force 16-bit immediate data #>XXXX #>$07 
Force absolute short address X!<XX X:<$02 
Force I/O short address X!<<XX X:<<$FFE3 
Force 16-bit absolute address X!>XXXX X:>$02 
Force short offset X:(SP-<xx) X:(SP-<$02) 
Force 16-bit offset X:(Rn+>Xxxx) X:(R0+>$03) 


Other assembler forcing operators are available for jump and branch instructions, as shown in Table 4-2. 


Table 4-2. Jump and Branch Forcing Operators 


Desired Action Forcing Operator Syntax Example 
Force 7-bit relative branch offset <XX <LABEL1 
Force 16-bit absolute jump address >XXXX >LABEL5 
Force 16-bit absolute loop address >XXXX >LABEL4 
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4.2.1 Register-Direct Modes 


The register-direct addressing modes specify that the operand is in one (or more) of the nine data ALU 
registers, seven address registers, or four control registers. The various options are shown in Table 4-3 on 


page 4-7. 


Addressing Modes 


Table 4-3. Addressing Mode—Register Direct 
Addressing Mode: Notation for Register Direct in the 
A : 1 Examples 
Register Direct Instruction Set Summary 
Any register DD A, A2, A1, AO 
DDDDD B, B2, B1, BO 
HHH Y, Y1, YO 
HHHH XO 
F RO, R1, R2, R3 
F1 SP 
N 
F1DD Mo1 
FDD 
PC 
Rj OMR, SR 
Rn LA, LC 
HWS 


1. The register field notations found in the middle column are explained in more detail 
in Table 6-16 on page 6-16 and Table 6-15 on page 6-15. 


4.2.1.1 Data or Control Register Direct 


The operand is in one, two, or three data ALU register(s) as specified in the operands or in a portion of the 
data bus movement field in the instruction. This addressing mode is also used to specify a control register 
operand. This reference is classified as a register reference. 


4.2.1.2 Address Register Direct 
The operand is in one of the seven address registers (RO-R3, N, MO1, or SP) specified by an effective 


address in the instruction. This reference is classified as a register reference. 


NOTE: 


Due to pipelining, if any address register is changed with a MOVE or 
bit-field instruction, the new contents will not be available for use as a 
pointer until the second following instruction. If the SP is changed, no 
LEA or POP instructions are permitted until the second following 


instruction. 


4.2.2 Address-Register-Indirect Modes 


When an address register is used to point to a memory location, the addressing mode is called address 
register indirect. The term indirect is used because the operand is not the address register itself, but the 
contents of the memory location pointed to by the address register. The effective address in the instruction 
specifies the address register Rn or SP and the address calculation to be performed. These addressing 
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modes specify that the operand is (or operands are) in memory and provide the specific address(es) of the 
operand(s). A portion of the data bus movement field in the instruction specifies the memory reference to 
be performed. The type of address arithmetic used is specified by the address modifier register. 


Table 4-4. Addressing Mode—Address Register Indirect 


Addressing Mode: Notation in the Instruction Exampl 
Address Register Indirect Set Summary! ase eal 
Accessing Program (P) Memory 
Post-increment P:(Rj)+ P:(RO)+ 
Post-update by offset N P:(Rj)+N P:(R3)+N 
Accessing Data (X) Memory 
No update X:(Rn) X:(R3) 
X:(N) 
X:(SP) 
Post-increment X:(Rn)+ X:(R1)+ 
X:(SP)+ 
Post-decrement X:(Rn)- X:(R3)- 
X:(N)- 
Post-update by offset N or N3 X:(Rn)+N X:(R1)4+N 
available for word accesses only 
Indexed by offset N X:(Rn+N) X:(R2+N) 
X:(SP+N) 
Indexed by 6-bit displacement X:(R2+Xx) X:(R2+15) 
R2 and SP registers only X:(SP-xx) X:(SP-$1E) 
Indexed by 16-bit displacement X:(RN+XxXxXx) X:(RO-97) 
X:(N+1234) 
X:(SP+$03F7) 


1. Rj represents one of the four pointer registers RO-R3; Rn is any of the AGU address 
registers RO-R3 or SP. 


Address-register-indirect modes may require an offset and a modifier register for use in address 
calculations. The address register (Rn or SP) is used as the address register, the shared offset register is 
used to specify an optional offset from this pointer, and the modifier register is used to specify the type of 
arithmetic performed. 


Some addressing modes are only available with certain address registers (Rn). For example, although all 
address registers support the “indexed by long displacement” addressing mode, only the R2 address 
register supports the “indexed by short displacement” addressing mode. For instructions where two reads 
are performed from the X data memory, the second read using the R3 pointer must always be from on-chip 
memory. The addressed register sets are summarized in Table 4-5. 
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Table 4-5. Address-Register-Indirect Addressing Modes Available 


nogistes amnmeule Addressing Modes Allowed Notes 
Set Types 
RO/M01/N Linear or modulo (RO) RO always uses the M01 register 
(RO)+ to specify modulo or linear arith- 
(RO)- metic. RO can optionally be used 
(RO)+N as a source register for the Tcc 
(RO+N) instruction. RO is the only register 
(RO+Xxxx) allowed as a counter for the 
NORM instruction. 
R1/M01/N Linear or modulo (R1) R1 optionally uses the M01 reg- 
(R1)+ ister to specify modulo or linear 
(R1)- arithmetic. R1 can optionally be 
(R1)+N used as a destination register for 
(R1+N) the Tcc instruction. 
(R1+Xxxx) 
R2/N Linear (R2) R2 supports a one-word indexed 
(R2)+ addressing mode. R2 is not 
(R2)- allowed as either pointer for 
(R2)+N instructions that perform two 
(R2+N) reads from X data memory. No 
(R2+xx) modulo arithmetic is allowed. 
(R2+Xxxx) 
R3/N Linear (R3) R3 provides a second address 
(R3)+ for instructions with two reads 
(R3)- from data memory. This second 
(R3)+N address can only access internal 
(R3+N) memory. It can also be used for 
(R3+Xxxx) instructions that perform one 
access to data memory. No mod- 
ulo arithmetic is allowed. 
SP/N Linear (SP) The SP supports a one-word 
(SP)- indexed addressing mode, which 
(SP)+ is useful for accessing local vari- 
(SP)+N ables and passed parameters. 
(SP+N) No modulo arithmetic is allowed. 
(SP-xx) 
(SP+xxxx) 


The type of arithmetic to be performed is not encoded in the instruction, but it is specified by the address 
modifier register (MO1 for the DSP56800 core). It indicates whether linear or modulo arithmetic is 
performed when doing address calculations. In the case where there is not a modifier register for a 
particular register set (R2 or R3), linear addressing is always performed. For address calculations using RO, 
the modifier register is always used; for calculations using R1, the modifier register is optionally used. 


Each address-register-indirect addressing mode is illustrated in the following subsections. 


4.2.2.1. No Update: (Rn), (SP) 


The address of the operand is in the address register Rn or SP. The contents of the Rn register are 
unchanged. The MO1 and N registers are ignored. This reference is classified as a memory reference. See 


Figure 4-3. 
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No Update Example: MOVE A1,X: (RO) 


Before Execution After Execution 
A2 Al AO A2 Al AO 
A{o[i 2 3 4/5 6 7 8 ACO | Vie, See we eB 
35 32 31 16 15 0 35 32 31 16 15 0 
X Memory X Memory 


Assembler syntax: X:(Rn), X:(SP) 
Additional instruction execution cycles: 0 


Additional effective address program words: 0 
AA0016 


Figure 4-3. Address Register Indirect: No Update 
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4.2.2.2 Post-Increment by 1: (Rn)+, (SP)+ 


The address of the operand is in the address register Rn or SP. After the operand address is used, it is 
incremented by one and stored in the same address register. The type of arithmetic (linear or modulo) used 
to increment Rn is determined by MO1 for RO and R1 and is always linear for R2, R3, and SP. The N 
register is ignored. This reference is classified as a memory reference. See Figure 4-4. 


Post-Increment Example: MOVE BO, xX: (R1) + 


Before Execution After Execution 
B2 B1 BO B2 B1 BO 
B 6 &. 4-6) (Fe oe B 6.5 44> SFE pce 
35 32 31 16 15 0 35 32 31 16 15 0 
X Memory X Memory 


N (n/a) N (n/a) 
15 0 15 0 

MOo1 $FFFF Mo1 $FFFF 
15 0 15 0 


Assembler syntax: X:(Rn)+, X:(SP)+, P:(Rn)+ 
Additional instruction execution cycles: 0 
Additional effective address program words: 0 
AA0017 


Figure 4-4. Address Register Indirect: Post-Increment 
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4.2.2.3 Post-Decrement by 1: (Rn)-, (SP)- 


The address of the operand is in the address register Rn or SP. After the operand address is used, it is 
decremented by one and stored in the same address register. The type of arithmetic (linear or modulo) used 
to increment Rn is determined by MO1 for RO and R1 and is always linear for R2, R3, and SP. The N 
register is ignored. This reference is classified as a memory reference. See Figure 4-5. 


Post-Decrement Example: MOVE B, X: (R1)- 
Before Execution After Execution 
B2 B1 BO B2 B1 BO 
B}o|6 5 4 3{/F E DC B}o|6 5 4 3{/F E Dc 
35 32 31 16 15 0 35 32 31 16 15 0 
X Memory X Memory 


N (n/a) N (n/a) 
15 0 15 0 

Mot $FFFF Mot $FFFF 
15 0 15 0 


Assembler syntax: X:(Rn)-, X:(SP)- 
Additional instruction execution cycles: 0 


Additional effective address program words: 0 
AA0018 


Figure 4-5. Address Register Indirect: Post-Decrement 
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4.2.2.4 Post-Update by Offset N: (Rn)+N, (SP)+N 


The address of the operand is in the address register Rn or SP. After the operand address is used, the 
contents of the N register are added to Rn and stored in the same address register. The content of N is 
treated as a two’s-complement signed number. The contents of the N register are unchanged. The type of 
arithmetic (linear or modulo) used to update Rn is determined by MO1 for RO and R1 and is always linear 
for R2, R3, and SP. This reference is classified as a memory reference. See Figure 4-6. 


Post-Update by Offset N Example: MOVE Y1,X: (R2) +N 


Before Execution After Execution 
Y1 YO Y1 YO 
¥ 56 6S BCA. A CAL OA ¥ 5 § 6h “bla. As A OA 
31 16 15 0 31 16 15 0 
X Memory X Memory 


Mot $FFFF 


15 0 


Assembler syntax: X:(Rn)+N, X:(SP)+N, P:(Rn)+N 
Additional instruction execution cycles: 0 


Additional effective address program words: 0 
AA0019 


Figure 4-6. Address Register Indirect: Post-Update by Offset N 
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4.2.2.5 Index by Offset N: (Rn+N), (SP+N) 


The address of the operand is the sum of the contents of the address register Rn or SP and the contents of 
the address offset register N. This addition occurs before the operand can be accessed and, therefore, 
inserts an extra instruction cycle. The content of N is treated as a two’s-complement signed number. The 
contents of the Rn and N registers are unchanged by this addressing mode. The type of arithmetic (linear or 
modulo) used to add N to Rn is determined by MO1 for RO and R1 and is always linear for R2, R3, and SP. 
This reference is classified as a memory reference. See Figure 4-7. 


Indexed by Offset N Example: MOVE A1, X: (RO+N) 


Before Execution After Execution 
A2 Al AO A2 Al AO 
A Se ae My: 0) ee es ee, A E D_¢ BA 8 8.7 
35 32 31 16 15 0 35 32 31 16 15 0 
X Memory X Memory 


Assembler syntax: X:(Rn+N), X:(SP+N) 
Additional instruction execution cycles: 1 


Additional effective address program words: 0 
AA0020 


Figure 4-7. Address Register Indirect: Indexed by Offset N 
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4.2.2.6 Index by Short Displacement: (SP-xx), (R2+xx) 


This addressing mode contains the 6-bit short immediate index within the instruction word. This field is 
always one-extended to form a negative offset when the SP register is used and is always zero-extended to 


form a positive offset when the R2 register is used. The type of arithmetic used to add the short 


displacement to R2 or SP is always linear; modulo arithmetic is not allowed. This addressing mode 
requires an extra instruction cycle. This reference is classified as an X memory reference. See Figure 4-8. 


Indexed by Short Displacement Example: MOVE A1,X: (R2+3) 


Before Execution After Execution 
A2 Al AO A2 Al AO 
A Ei 2-8 || ie. <8: 37 A E« p16? (6 [VA 6 
35 32 31 16 15 0 35 32 31 16 15 
X Memory X Memory 


Short Immediate Value 
from the Instruction Word 


Assembler syntax: X:(Rn+xx), X:(SP-xx) 
Additional instruction execution cycles: 1 
Additional effective address program words: 0 


Figure 4-8. Address Register Indirect: Indexed by Short Displacement 
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4.2.2.7 Index by Long Displacement: (Rn+xxxx), (SP+xxxx) 


This addressing mode contains the 16-bit long immediate index within the instruction word. This second 
word is treated as a signed two’s-complement value. The type of arithmetic (linear or modulo) used to add 
the long displacement to Rn is determined by MO1 for RO and R1 and is always linear for R2, R3, and SP. 
This addressing mode requires two extra instruction cycles. This addressing mode is available for MOVEC 
instructions. This reference is classified as an X memory reference. See Figure 4-9. 


Indexed by Long Displacement Example: MOVE A1,X: (RO+S10CF) 


Before Execution After Execution 
A2 Al AO A2 Al AO 
A Ei 2-8 || ie. <8: 37 A E. 16) (BA. 6 Bs 7 
35 32 31 16 15 0 35 32 31 16 15 0 
X Memory X Memory 
15 0 
$80CF $80CF |E D C B 
$7000 $7000 | X X X xX 
RO RO $7000 
15 0 
N N $4567 
15 0 
M01 M01 $FFFF 
15 0 


Long Immediate Value 
from the Instruction Word 


Assembler syntax: X:(RN+xxxx), X:(SP+xxxx) 
Additional instruction execution cycles: 2 
Additional effective address program words: 1 
AA0022 


Figure 4-9. Address Register Indirect: Indexed by Long Displacement 
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4.2.3 Immediate Data Modes 


The immediate data modes specify the operand directly in a field of the instruction. That is, the operand 
value to be used is contained within the instruction word itself (or words themselves). There are two types 
of immediate data modes: immediate data, which uses an extension word to contain the operand, and 
immediate short data, where the operand is contained within the instruction word. Table 4-6 summarizes 
these two modes. 


Table 4-6. Addressing Mode—Immediate 


Addressing Mode: Notation in the Instruction E 
3 xamples 
Immediate Set Summary 
Immediate short data—5, 6, 7-bit #XX #14 
(unsigned and signed) #<3 
Immediate data—16-bit FEXXXX #$369C 
(unsigned and signed) #>1234 
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4.2.3.1 Immediate Data: #xxxx 


This addressing mode requires one word of instruction extension. This additional word contains the 16-bit 
immediate data used by the instruction. This reference is classified as a program reference. Examples of 


the use and effects of immediate-data mode are shown in Figure 4-10 on page 4-18. 


Immediate into 16-Bit Register Example: MOVE #$A987,B1 


Before Execution After Execution 


B rx x x x]{X x x Xx] B za 


35 32 31 16 15 35 32 Atte 16 15 


Positive Immediate into 36-Bit Accumulator Example: MOVE #$1234,B 


Before Execution After Execution 


35 32 a KE 16 em 35 32 fe 16 [oe oo 


Negative Immediate into 36-Bit Accumulator Example: MOVE #SA987,1 


Before Execution After Execution 


35 32 eA Ke 16 em 35 32 [Ao 8 7 16 [oo 9 8 


Assembler syntax: #xxxx 
Additional instruction execution cycles: 1 
Additional effective address program words: 1 


Figure 4-10. Special Addressing: Immediate Data 


4-18 DSP56800 Family Manual 


AA0023 


@ MOTOROLA 


Addressing Modes 


Before Execution 


N XXXX 


15 0 


Immediate Short into 16-Bit Address Register Example: MOVE #$0027,N 


After Execution 


N $0027 


15 0 


Before Execution 


XO XXXX 


15 0 


Immediate Short into 16-Bit Data Register Example: MOVE #SFFC6, X0 


After Execution 


XO $FFC6 


15 0 


Before Execution 


B ESSeskea ar. 


35 32 31 16 15 


Immediate Short into 16-Bit Accumulator Register Example: MOVE #$001C,B1 


After Execution 


35 32 po 16 Px kk OC 


Before Execution 


B Pe as Cae eee 


35 32 31 16 15 


Positive Immediate Short into 36-Bit Accumulator Example: MOVE #$001C,B 


After Execution 


8 Petes tee oo] 


35 32 31 16 15 


Before Execution 


B Ke Ke RS ee EE 


35 32 31 16 15 


Negative Immediate Short into 36-Bit Accumulator Example: MOVE #SFFC6, 


w 


After Execution 


8 PTF r oste a 03 


35 32 31 16 15 


Assembler syntax: #xx 
Additional instruction execution cycles: 0 
Additional effective address program words: 0 


AA0024 


Figure 4-11. Special Addressing: Immediate Short Data 
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4.2.3.2 Immediate Short Data: #xx 


The immediate-short-data operand is located within the instruction operation word. A 6-bit unsigned 
positive operand is used for DO and REP instructions, and a 7-bit signed operand is used for an immediate 
move to an on-core register instruction. This reference is classified as a program reference. See 


Figure 4-11 on page 4-19. 


4.2.4 Absolute Addressing Modes 


Similar to the direct addressing modes, the absolute addressing modes specify the operand value within the 
instruction or instruction-extension words. Unlike the direct modes, these values are not used as the 
operands themselves, but are interpreted as absolute data memory addresses for the operand values. The 
different absolute addressing modes are shown in Table 4-7. 


Table 4-7. Addressing Mode—Absolute 


Addressing Mode: Notation in the Instruction Examples 
Absolute Set Summary P 

Absolute short address—6 bit X:aa X:$0002 
(direct addressing) X:<$02 
I/O short address—6 bit X:pp X:$00FFE3 
(direct addressing) X:<<$FFE3 
Absolute address—16-bit X!XXXX X:$00F001 
(extended addressing) X:>$C002 
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4.2.4.1 Absolute Address (Extended Addressing): xxxx 


This addressing mode requires one word of instruction extension, which contains the 16-bit absolute 
address of the operand. No registers are used to form the address of the operand. Absolute address 
instructions are used with the bit-manipulation and move instructions. This reference is classified as a 
memory reference and a program reference. See Figure 4-12. 


Absolute Address Example: MOVE X:$5079, X0 


Before Execution After Execution 


XO XXXX XO $1234 


ae 
oa 
oO 
ae 
oa 
Oo 


X Memory X Memory 


$5079 $5079 


Assembler syntax: X:xxxx 
Additional instruction execution cycles: 1 


Additional effective address program words: 1 
AA0025 


Figure 4-12. Special Addressing: Absolute Address 
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4.2.4.2 Absolute Short Address (Direct Addressing): <aa> 


For the absolute short addressing mode, the address of the operand occupies 6 bits in the instruction 
operation word and is zero-extended. This allows direct access to the first 64 locations in X memory. No 
registers are used to form the address of the operand. Absolute short instructions are used with the bit-field 
manipulation and move instructions. See Figure 4-13. 


Absolute Short Address Example: MOVE R2,X:<$0003 


Before Execution After Execution 
R2 $ABCD R2 $ABCD 
15 0 15 0 
X Memory X Memory 


15 0 
$0003 | X xX X xX $0003 | A B C OD 


$0000 $0000 


Assembler syntax: X:<aa> 
Additional instruction execution cycles: 0 
Additional effective address program words: 0 
AA0026 


Figure 4-13. Special Addressing: Absolute Short Address 
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4.2.4.3 I/O Short Address (Direct Addressing): <pp> 


For the I/O short addressing mode, the address of the operand occupies 6 bits in the instruction operation 
word and is one-extended. This allows direct access to the last 64 locations in X memory, which contain 
the on-chip peripheral registers. No registers are used to form the address of the operand. See Figure 4-14 
for examples of using the I/O short direct addressing mode. 


I/O Short Address Example: MOVE X:<<SFFFB, R3 


Before Execution After Execution 


R3 XXXX R3 $5678 
15 0 15 0 


X Memory X Memory 


Assembler syntax: X:<pp> 
Additional instruction execution cycles: 0 
Additional effective address program words: 0 
AA0027 


Figure 4-14. Special Addressing: I/O Short Address 


4.2.5 Implicit Reference 


Some instructions make implicit reference to the program counter (PC), software stack, hardware stack 
(HWS), loop address register (LA), loop counter (LC), or status register (SR). The implied registers and 
their use are defined by the individual instruction descriptions. See Appendix A, “Instruction Set Details,“ 
for more information. 


4.2.6 Addressing Modes Summary 


Table 4-8 on page 4-24 contains a summary of the addressing modes discussed in the preceding 
subsections of Section 4.2. 


© MOTOROLA Address Generation Unit 4-23 


Address Generation Unit 


Table 4-8. Addressing Mode Summary 


Operand Reference 


Addressing Mode see Assembler Syntax 
s? | c? | D4 | A® | P& | x7 | xx8 
Register Direct 
Data or control register No Xx xX 
Address register (Rn, SP) No Xx Rn 
Address modifier register (M01) No Xx Mo1 
Address offset register (N) No Xx N 
Hardware stack (HWS) No X HWS 
Software stack No X 


Address Register Indirect 


No update No xX (Rn) 
Post-increment by 1 Yes X X X (Rn)+ 
Post-decrement by 1 Yes X (Rn)- 
Post-update by offset N Yes X X X (Rn)+N 
Index by offset N Yes Xx (Rn+N) 
Index by short displacement No Xx (R2+xx) or (SP-xx) 
Index by long displacement Yes X (Rn+xxxx) or 
(SP+Xxxx) 
Immediate, Absolute, and Implicit 

Immediate data No Xx #XXXX 
Immediate short data No Xx #XX 
Absolute address No xX Xx XXXX 
Absolute short address No Xx <aa> 
I/O short address No X <pp> 
Implicit No X X X X 

1. The M01 modifier can only be used on the RO/N/M01 or R1/N/M01 register sets 

2. Hardware stack reference 

3. Program controller register reference 

4. Data ALU register reference 

5. Address Generation Unit register reference 

6. Program memory reference 

7. Xmemory reference 

8. Dual X memory read 
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4.3 AGU Address Arithmetic 


When an arithmetic operation is performed in the address generation unit, it can be performed using either 
linear or modulo arithmetic. Linear arithmetic is used for general-purpose address computation, as found in 
all microprocessors. Modulo arithmetic is used to create data structures in memory such as circular buffers, 
first-in-first-out queues (FIFOs), delay lines, and fixed-size stacks. Using these structures allows data to be 
manipulated simply by updating address register pointers, rather than by moving large blocks of data. 


Linear versus modulo arithmetic is selected using the modifier register, MO1. Arithmetic on the RO and R1 
AGU registers may be performed using either linear or modulo arithmetic. The R2, R3, and SP registers 
can be modified using linear arithmetic only. 


4.3.1 Linear Arithmetic 


Linear arithmetic is “normal” address arithmetic, as found on general-purpose microprocessors. It is 
performed using 16-bit two’s-complement addition and subtraction. The 16-bit offset register N, or 
immediate data (+1, -1, or a displacement value), is used in the address calculations. Addresses are 
normally considered unsigned; offsets are considered signed. 


Linear arithmetic is enabled for the RO and R1 registers by setting the modifier register (M01) to $FFFF. 
The MOI register is set to $FFFF on reset. 


NOTE: 


To ensure compatibility with future generations of DSP56800-compatible 
DSP devices, care should be taken to avoid address arithmetic operations 
that can cause address register values to overflow. On DSP56800 Family 
chips, register values can be expected to “wrap” appropriately. Future 
generations may support address ranges > 64K, however, causing potential 
address-calculation errors. 


4.3.2 Modulo Arithmetic 


Many DSP and standard control algorithms require the use of specialized data structures, such as circular 
buffers, FIFOs, and stacks. The DSP56800 architecture provides support for these algorithms by 
implementing modulo arithmetic in the address generation unit. 


4.3.2.1 Modulo Arithmetic Overview 


To understand modulo address arithmetic, consider the example of a circular buffer. A circular buffer is a 
block of sequential memory locations with a special property: a pointer into the buffer is limited to the 
buffer’s address range. When a buffer pointer is incremented such that it would point past the end of the 
buffer, the pointer is “wrapped” back to the beginning of the buffer. Similarly, decrementing a pointer that 
is located at the beginning of the buffer will wrap the pointer to the end. This behavior is achieved by 
performing modulo arithmetic when incrementing or decrementing the buffer pointers. See Figure 4-15 on 
page 4-26. 
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Upper Boundary: Lower Boundary + M01 


Address 


—% Circular 
Pointer Buffer M01 = Size of Modulo Region Minus One 


Lower Boundary: “K” LSBs Are All “Os” 


Address of Lower Boundary: 
15 kk1o- 1 0 


Base Address 0/0/0/0/0 


Figure 4-15. Circular Buffer 


The modulo arithmetic unit in the AGU simplifies the use of a circular buffer by handling the address 
pointer wrapping for you. After establishing a buffer in memory, the RO and R1 address pointers can be 
made to wrap in the buffer area by programming the MO1 register. 


Modulo arithmetic is enabled by programming the MO1 register with a value that is one less than the size 
of the circular buffer. See Section 4.3.2.2, “Configuring Modulo Arithmetic,” for exact details on 
programming the MO1 register. Once enabled, updates to the RO or R1 registers using one of the 
post-increment or post-decrement addressing modes are performed with modulo arithmetic, and will wrap 
correctly in the circular buffer. 


The address range within which the address pointers will wrap is determined by the value placed in the 
MO1 register and the address contained within one of the pointer registers. Due to the design of the modulo 
arithmetic unit, the address range is not arbitrary, but limited based on the value placed in MO1. The lower 
bound of the range is calculated by taking the size of the buffer, rounding it up to the next highest power of 
two, and then rounding the address contained in the RO or R1 pointers down to the nearest multiple of that 
value. 


For example: for a buffer size of M, a value 2 is calculated such that 2* >M. This is the buffer size 
rounded up to the next highest power of two. For a value M of 37, 2k would be 64. The lower boundary of 
the range in which the pointer registers will wrap is the value in the RO or R1 register with the low-order k 
bits all set to zero, effectively rounding the value down to the nearest multiple of 2k (64 in this case). This 
is shown in Figure 4-16 on page 4-27. 
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(Unavailable 
Addresses) 


Upper Boundary: $00A4; = = ==——|t-_ Lower Bound + Size - 1 = Upper Bound 


$009F| [Initial RO Pointer Value 


Circular 
Buffer 


Lower Bound Relative to RO 


Lower Boundary: $0080 sid 


Figure 4-16. Circular Buffer with Size M=37 


When modulo arithmetic is performed on the buffer pointer register, only the low-order k bits are 
modified; the upper 16 - k bits are held constant, fixing the address range of the buffer. The algorithm used 
to update the pointer register (RO in this case) is as follows: 


RO[15:k] = RO[15:k] 
RO[k-1:0] = (RO[k-1:0] + offset) MOD (M01 + 1) 


Note that this algorithm can result in some memory addresses being unavailable. If the size of the buffer is 
not an even power of two, there will be a range of addresses between M and | (37 and 63 in our 
example) that are not addressable. Section 4.3.2.7.3, “Memory Locations Not Available for Modulo 
Buffers,” addresses this issue in greater detail. 


4.3.2.2 Configuring Modulo Arithmetic 


As noted in Section 4.3.2.1, “Modulo Arithmetic Overview,” modulo arithmetic is enabled by 
programming the address modifier register, M01. This single register enables modulo arithmetic for both 
the RO and RI registers, although in order for modulo arithmetic to be enabled for the R1 register it must 
be enabled for the RO register as well. When both pointers use modulo arithmetic, the sizes of both buffers 
are the same. They can refer to the same or different buffers as desired. 


The possible configurations of the MO1 register are given in Table 4-9. 


Table 4-9. Programming M01 for Modulo Arithmetic 


16-Bit M01 Address Arithmetic Pointer Registers 
Register Contents Performed Affected 
$0000 (Reserved) — 
$0001 Modulo 2 RO pointer only 
$0002 Modulo 3 RO pointer only 
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Table 4-9. Programming M01 for Modulo Arithmetic (Continued) 


16-Bit M01 Address Arithmetic Pointer Registers 
Register Contents Performed Affected 
$3FFE Modulo 16383 RO pointer only 
$3FFF Modulo 16384 RO pointer only 
$4000 (Reserved) — 
$7FFF (Reserved) = 
$8000 (Reserved) — 
$8001 Modulo 2 RO and R1 pointers 
$8002 Modulo 3 RO and R1 pointers 
$BFFE Modulo 16383 RO and R1 pointers 
$BFFF Modulo 16384 RO and R1 pointers 
$C000 (Reserved) — 
$FFFE (Reserved) — 
$FFFF Linear Arithmetic RO and R1 pointers both 
set up for linear arith- 
metic 
The high-order two bits of the MO1 register determine the arithmetic mode for RO and R1. A value of 00 
for MO1[15:14] selects modulo arithmetic for RO. A value of 10 for M01[15:14] selects modulo arithmetic 


for both RO and R1. A value of 11 disables modulo arithmetic. The remaining 14 bits of MO1 hold the size 
of the buffer minus one. 
NOTE: 


The reserved values ($0000, $4000-$8000, and $CO00-$FFFE) should not 
be used. The behavior of the modulo arithmetic unit is undefined for these 
values, and may result in erratic program execution. 
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4.3.2.3. Supported Memory Access Instructions 


The address generation unit supports modulo arithmetic for the following address-register-indirect modes: 


(Rn) (Rn)+ 
(Rn)- (Rn)+N 
(Rn+N) (Rn+XXxx) 


As noted in the preceding discussion, modulo arithmetic is only supported for the RO and R1 address 
registers. 


4.3.2.4 Simple Circular Buffer Example 


Suppose a five-location circular buffer is needed for an application. The application locates this buffer at 
X:$800 in memory. (This location is arbitrary—any location in data memory would suffice.) In order to 
configure the AGU correctly to manage this circular buffer, the following two pieces of information are 
needed: 


The size of the buffer: five words 
The location of the buffer: X:$0800 — X:$0804 


Modulo addressing is enabled for the RO pointer by writing the size minus one ($0004) to MO1[13:0], and 
00 to MO1[15:14]. See Figure 4-17. 


$0804 


Circular 


M01 Register = Size - 1 = 5 - 1 = $0004 


Joy 


Buffer 


RO ——\—> 


Figure 4-17. Simple Five-Location Circular Buffer 


The location of the buffer in memory is determined by the value of the RO pointer when it is used to access 
memory. The size of the memory buffer (five in this case) is rounded up to the nearest power of two (eight 
in this case). The value in RO is then rounded down to the nearest multiple of eight. For the base address to 
be X:$0800, the initial value of RO must be in the range X:$0800 — X:$0804. Note that the initial value of 
RO does not have to be X:$0800 to establish this address as the lower bound of the buffer. However, it is 
often convenient to set RO to the beginning of the buffer. The source code in Example 4-1 shows the 
initialization of the example buffer. 


Example 4-1. Initializing the Circular Buffer 


Initialize the buffer for five locations 
RO can be initialized to any location 
within the buffer. For simplicity, RO 

is initialized to the value of the lower 
boundary 


MOV] 
MOV! 


# (5-1) ,MO1 
#$0800, RO 


Gl GI 


Ne Ne Ne Ne Ne 
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The buffer is used simply by accessing it with MOVE instructions. The effect of modulo address 
arithmetic becomes apparent when the buffer is accessed multiple times, as in Example 4-2 on page 4-30. 


Example 4-2. Accessing the Circular Buffer 


MOVE xX: (RO)+,X0 ; First time accesses location $0800 

; and bumps the pointer to location $0801 
MOVE X:(RO)+,X0 ; Second accesses at location $0801 
MOVE X:(RO)+,X0 ; Third accesses at location $0802 
MOVE X:(RO)+,X0 ; Fourth accesses at location $0803 
MOVE X:(RO)+,X0 ; Fifth accesses at location $0804 

; and bumps the pointer to location $0800 
MOVE X:(RO)+,X0 ; Sixth accesses at location $0800 <=== NOTE 
MOVE X:(RO)+,X0 ; Seventh accesses at location $0801 
MOVE X:(RO)+,X0 ; and so forth... 


For the first several memory accesses, the buffer pointer is incremented as expected, from $0800 to $0801, 
$0802, and so forth. When the pointer reaches the top of the buffer, rather than incrementing from $0804 to 
$0805, the pointer value “wraps” back to $0800. 


The behavior is similar when the buffer pointer register is incremented by a value greater than one. 
Consider the source code in Example 4-3, where RO is post-incremented by three rather than one. The 
pointer register correctly “wraps” from $0803 to $0801—the pointer does not have to land exactly on the 
upper and lower bound of the buffer for the modulo arithmetic to wrap the value properly. 


Example 4-3. Accessing the Circular Buffer with Post-Update by Three 


MOVE #(5-1),MO01 ; Initialize the buffer for five locations 
MOVE  #5S0800,RO0 ; Initialize the pointer to $0800 
MOVE #3,N ; Initialize “bump value” to 3 
NOP 
NOP 
MOVE X:(RO)+N,X0 ; First time accesses location $0800 
; and bumps the pointer to location $0803 
MOVE X:(RO)+N,X0 ; Second accesses at location $0803 
td 


and wraps the pointer around to $0801 


MOVE X:(RO)+N,X0 ; Third accesses at location $0801 
; and bumps the pointer to location $0804 
MOVE X:(RO)+N,X0O ; Fourth accesses at ... 


In addition, the pointer register does not need to be incremented; it could be decremented instead. 
Instructions that post-decrement the buffer pointer also work correctly. Executing the instruction MOV] 
X: (RO) -,X0 when the value of RO is $0800 will correctly set RO to $0804. 


Gl 


4.3.2.5 Setting Up a Modulo Buffer 


The following steps detail the process of setting up and using the 37-location circular buffer shown in 
Figure 4-16 on page 4-27. 


1. Determine the value for the MO1 register. 


— Select the size of the desired buffer; it can be no larger than 16,384 locations. If modulo 
arithmetic is to be enabled only for the RO address register, this gives the following: 
MO1 = # locations - | = 37 - 1 = 36 = $0024 


— If modulo arithmetic is to be enabled for both the RO and R1 address registers, be sure to set the 
high-order bit of MO1: 
MO1 = # locations - 1 + $8000 = 37 - 1 + 32768 = 32804 = $8024 
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2. Find the nearest power of two greater than or equal to the circular buffer size. In this 
example, the value would be 2k > 37, which gives us a value of k= 6. 


3. Fromk, derive the characteristics of the lower boundary of the circular buffer. Since the “k” 
least-significant bits of the address of the lower boundary must all be Os, then the buffer 


base address must be some multiple of 2* In this case, k = 6, so the base address is some 
multiple of 2° = 64. 


4. Locate the circular buffer in memory. 


— The location of the circular buffer in memory is determined by the upper 16 - k bits of the 
address pointer register used in a modulo arithmetic operation. If there is an open area of 
memory from locations 111 to 189 (SO06F to $00BD), for example, then the addresses of the 
lower and upper boundaries of the circular buffer will fit in this open area for J = 2: 

Lower boundary = (J x 64) = (2 x 64) = 128 = $0080 
Upper boundary = (J x 64) + 36 = (2 x 64) + 36 = 164 = $00A4 


— The exact area of memory in which a circular buffer is prepared is specified by picking a value 
for the address pointer register, RO or R1, whose value is inclusively between the desired lower 
and upper boundaries of the circular buffer. Thus, selecting a value of 139 ($008B) for RO 
would locate the circular buffer between locations 128 and 164 ($0080 to $00A4) in memory 
since the upper 10 (16 - k) bits of the address indicate that the lower boundary is 128 ($0080). 


— Insummary, the size and exact location of the circular buffer is defined once a value is assigned 
to the MO1 register and to the address pointer register (RO or R1) that will be used in a modulo 
arithmetic calculation. 


5. Determine the upper boundary of the circular buffer, which is the lower boundary + # 
locations - 1. 


6. Select a value for the offset register if it is used in modulo operations. 


— If the offset register is used in a modulo arithmetic calculation, it must be selected as follows: 
IN| < MOL + 1 [where |N| refers to the absolute value of the contents of the offset register] 


— The special case where N is a multiple of the block size, 2k is discussed in Section 4.3.2.6, 
“Wrapping to a Different Bank.” 


7. Perform the modulo arithmetic calculation. 


— Once the appropriate registers are set up, the modulo arithmetic operation occurs when an 
instruction with any of the following addressing modes using the RO (or R1, if enabled) register 
is executed: 

(Rn) 
(Rn)+ 
(Rn)- 
(Rn)+N 
(Rn+N) 
(Rn+xxxx) 


— Ifthe result of the arithmetic calculation would exceed the upper or lower bound, then wrapping 
around is correctly performed. 


4.3.2.6 Wrapping to a Different Bank 


For the normal case where |N| is less than or equal to MO1, the primary address arithmetic unit will 
automatically wrap the address pointer around by the required amount. This type of address modification is 
useful in creating circular buffers for FIFOs, delay lines, and sample buffers up to 16,384 words long. It is 
also used for decimation, interpolation, and waveform generation. 
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If |N| is greater than MO1, the result is soe dependent and unpredictable except for the special case where 
N= L*(2* ), a multiple of the block size, 2k where L is a positive integer. For this special case when using 
the (Rn)+N addressing mode, the pointer Rn will be updated using linear arithmetic to the same relative 
address that is L blocks forward in memory (see Figure 4-18). Note that this case requires that the offset N 
must be a positive two’s-complement integer. 


(Rn) +N MOD MOo1 
where N = 2 (L = 1) 


Figure 4-18. Linear Addressing with a Modulo Modifier 


This technique is useful in sequentially, processing multiple tables or N-dimensional arrays. The special 
modulo case of (Rn)+N with N = L*(2* ) is useful for performing the same algorithm on multiple blocks of 
data in memory (e.g., implementing a bank of parallel IIR filters). 


4.3.2.7 Side Effects of Modulo Arithmetic 


Due to the way modulo arithmetic is implemented by the DSP56800 Family, there are some side effects of 
using modulo arithmetic that must be kept in mind. Specifically, since the base address of a buffer must be 
a power of two, and since the modulo arithmetic unit can only detect a single wraparound, there are some 
restrictions and limitations that must be considered. 


4.3.2.7.1_ When a Pointer Lies Outside a Modulo Buffer 


If a pointer is outside the valid modulo buffer range and an operation occurs that causes RO or RI to be 
updated, the contents of the pointer will be updated according to modulo arithmetic rules. For example, a 
MOVE B, X: (RO) +N instruction, where RO = 6, MO1 =5, and N = 0, would apparently leave RO unchanged 
since N = 0. However, since RO is above the upper boundary, the AGU calculates RO + N - (MO1 + 1) for 
the new contents of RO and sets RO = 0. 


4.3.2.7.2 Restrictions on the Offset Register 


The modulo arithmetic unit in the AGU is only capable of detecting a single wraparound of an address 
pointer. As a result, if the post-update addressing mode, (Rn)+N, is used, care must be taken in selecting 
the value of N. The 16-bit absolute value |N| must be less than or equal to MO1 + 1 for proper modulo 
addressing. Values of |N| larger than the size of the buffer may result in the Rn address value wrapping 
twice, which the AGU cannot detect. 
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4.3.2.7.3. Memory Locations Not Available for Modulo Buffers 


For cases where the size of a buffer is not a power of two, there will be a range of memory locations 
immediately after the buffer that are not accessible with modulo addressing. Lower boundaries for modulo 
buffers always begin on an address where the lowest k bits are zeros—that is, a power of two. This means 
that for buffers that are not an exact power of two, there are locations above the upper boundary that are 
not accessible through modulo addressing. 


In Figure 4-16 on page 4-27, for example, the buffer size is 37, which is not a power of two. The smallest 
power of two greater than 37 is 64. Thus, there are 64 - 37 = 27 memory locations which are not accessible 
with modulo addressing. These 27 locations are between the upper boundary + 1 = $00A5 and the next 
power of two boundary address - 1 = $00CO - 1 = $OOBF. 


These locations are still accessible when no modulo arithmetic is performed. Using linear addressing (with 
the R2 or R3 pointers), absolute addresses, or the no-update addressing mode makes these locations 
available. 


4.4 Pipeline Dependencies 


There are some cases within the address generation unit where the pipelined nature of the DSP core can 
affect the execution of a sequence of instructions. The pipeline dependencies are caused by a write to an 
AGU register immediately followed by an instruction that uses that same register in an address arithmetic 
calculation. When there is a dependency caused by a write to the N register, the DSP automatically stalls 
the pipeline one cycle. If a dependency is caused by a write to the RO-R3, SP, or MO1 registers, however, 
there is no pipeline stall. This is also true if a bit-field operation is performed on the N register. Instead, the 
user must take care to avoid this case by rearranging the instructions or by inserting a NOP instruction to 
break the instruction sequence. 


Several instruction sequences are presented in the following examples to examine cases where their 
pipeline dependency occurs, how this affects the machine, and how to correctly program to avoid these 
dependencies. 


In Example 4-4 there is no pipeline dependency since the N register is not used in the second instruction. 
Since there is no dependency, no extra instruction cycles are inserted. 


Example 4-4. No Dependency with the Offset Register 


MOV] 
MOV] 


#57,N ; Write to the N register 
X: (R2)+,X0 ; N not used in this instruction 


In Example 4-5 there is no pipeline dependency since the R2 and N registers, used in the address 
calculation, are not written in the previous instruction. Since there is no dependency, no extra instruction 
cycles are inserted. 


Example 4-5. No Dependency with an Address Pointer Register 


MOV] 
MOV] 


#$7,R1 ; Write to R1 register 
X: (R2) +N, XO ; Rl not used in this instruction 


In Example 4-6 there is no pipeline dependency since there is no address calculation performed in the 
second instruction. Instead, the R1 register is used as the source operand in a MOVE instruction, for which 
there is no pipeline dependency. Since there is no dependency, no extra instruction cycles are inserted. 
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Example 4-6. No Dependency with No Address Arithmetic Calculation 


MOV] 
MOV! 


#$7,R1 ; Write to R1 register 
R1,X:$0004 ; No address arithmetic calculation 
; performed 


Gl GI 


Example 4-7 represents a special case. For the X:(Rn+xxxx) addressing mode, there is no pipeline 
dependency even if the same Rn register is written on the previous cycle. This is true for RO-R3 as well as 
the SP register. Since there is no dependency, no extra instruction cycles are inserted. 


Example 4-7. No Dependency with (Rn+xxxx) 


MOVE #$7,R1 ; Write to Rl register 
MOVE xX: (R1+$3456),X0 ; X: (Rn+xxxx) addressing mode 
7; using R1 


In Example 4-8 there is a pipeline dependency since the N register is used in the second instruction. This is 
true for using N to update RO-R3 as well as the SP register. For the case where a dependency is caused by 
a write to the N register, the DSP core automatically stalls the pipeline by inserting one extra instruction 
cycle. Thus, this sequence is allowed. This dependency also exists for the (Rn+N) addressing mode. 


Example 4-8. Dependency with a Write to the Offset Register 


MOV] 
MOV! 


#57,N ; Write to the N register 
X: (R2) +N, XO ; N register used in address 
; arithmetic calculation 


Gl GI 


In Example 4-9 there is a pipeline dependency since the N register is used in the second instruction. This is 
true for using N to update RO-R3 as well as the SP register. For the case where a dependency is caused by 
a bit-field operation on the N register, this sequence is not allowed and is flagged by the assembler. This 
sequence may be fixed by rearranging the instructions or inserting a NOP between the two instructions. 
This dependency only applies to the BFSET, BFCLR, or BFCHG instructions. There is no dependency for 
the BFTSTH, BFTSTL, BRCLR, or BRSET instructions. This dependency also exists for the (Rn+N) 
addressing mode. 


Example 4-9. Dependency with a Bit-Field Operation on the Offset Register 


BFSET #S7,N Bit-field operation on the N 


7 
; register 
, 
, 


MOVE X:(R2)+N,X0 ; N register used in address 


; arithmetic calculation 


In Example 4-10 there is a pipeline dependency since the address pointer register written in the first 
instruction is used in an address calculation in the second instruction. For the case where a dependency is 
caused by a write to one of these registers, this sequence is not allowed and is flagged by the assembler. 
This sequence may be fixed by rearranging the instructions or inserting a NOP between the two 
instructions. 


Example 4-10. Dependency with a Write to an Address Pointer Register 


MOV] 
MOV] 


#$7,R2 ; Write to the R2 register 
X: (R2)+,X0 ; R2 register used in address 
; arithmetic calculation 


Gl GI 


In Example 4-11 there is a pipeline dependency since the MO1 register written in the first instruction is 
used in an address calculation in the second instruction. For the case where a dependency is caused by a 
write to the MO1 register, this sequence is not allowed and is flagged by the assembler. This sequence may 
be fixed by rearranging the instructions or inserting a NOP between the two instructions. 
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Example 4-11. Dependency with a Write to the Modifier Register 


MOV] 
MOV! 


#57,M01 ; Write to the M01 register 
X: (RO) +, X0 ; MO1 register used in address 
; arithmetic calculation 


Gl GI 


In Example 4-12 there is a pipeline dependency since the SP register written in the first instruction is used 
by the immediately following JSR instruction to store the subroutine return address. The stack pointer will 
not be updated with the immediate value in this case. This sequence may be fixed by inserting a NOP 
between the two instructions. 


Example 4-12. Dependency with a Write to the Stack Pointer Register 


MOVE #$3800,SP ; Write to the SP register 
JSR LABEL ; SP implicitly used to save the return address 
; of the subroutine call 


In Example 4-13 there is a pipeline dependency due to contention in the LF bit of the SR register. During 
the first execution cycle of the BFSET instruction, the SR, whose LF bit is zero, is read. At the same time, 
the first operand of the DO instruction is fetched. During the second execution cycle of the BFSET 
instruction, the SR’s content is modified and written back to the SR. This is also the DO instruction decode 
cycle, when the LF bit is set. In this case, the LF bit is first set by the DO decode, then cleared by the 
BFSET SR modification. A cleared LF bit signals the end of a DO loop, so the DO loop is executed only 
once. This sequence can be fixed by inserting a NOP instruction between these two instructions. 


Example 4-13. Dependency with a Bit-Field Operation and DO Loop 


BFSET #$0200,SR ; Write to the SR register 
DO #8, ENDLOOP ; Repeat 8 times body of loop 
ENDLOOP 
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Chapter 5 
Program Controller 


The program controller unit is one of the three execution units in the central processing module. The 
program controller performs the following: 


¢ Instruction fetching 
¢ Instruction decoding 
¢ Hardware DO and REP loop control 
e Exception (interrupt) processing 
This section covers the following: 
¢ The architecture and programming model of the program controller 
¢ The operation of the software stack 
e« A discussion of program looping 


Details of the instruction pipeline and the different processing states of the DSP chip, including reset and 
interrupt processing, are covered in Chapter 7, “Interrupts and the Processing States.” 


5.1 Architecture and Programming Model 


A block diagram of the program controller is shown in Figure 5-1 on page 5-2, and its corresponding 
programming model is shown in Figure 5-2 on page 5-3. The programmer views the program controller as 
consisting of five registers and a hardware stack (HWS). In addition to the standard program flow-control 
resources such as a program counter (PC) and status register (SR), the program controller features registers 
dedicated to supporting the hardware DO loop instruction—loop address (LA), loop counter (LC), and the 
hardware stack—and an operating mode register (OMR) defining the DSP operating modes. 


The blocks and registers within the program controller are explained in the following subsections. 
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Figure 5-1. Program Controller Block Diagram 
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Figure 5-2. Program Controller Programming Model 


5.1.1 Program Counter 


The program counter (PC) is a 16-bit register that contains the address of the next location to be fetched 
from program memory space. The PC may point to instructions, data operands, or addresses of operands. 
Reference to this register is always implicit and is implied by most instructions. This special-purpose 
address register is stacked when hardware DO looping is initiated (on the hardware stack), when a jump to 
a subroutine is performed (on the software stack), and when interrupts occur (on the software stack). 


5.1.2 Instruction Latch and Instruction Decoder 


The instruction latch is a 16-bit internal register used to hold all instruction opcodes fetched from memory. 
The instruction decoder, in turn, uses the contents of the instruction latch to generate all control signals 
necessary for pipeline control—for normal instruction fetches, jumps, branches, and hardware looping. 


5.1.3 Interrupt Control Unit 


The interrupt control unit receives all interrupt requests, arbitrates among them, and then checks the 
highest-priority interrupt request against the interrupt mask bits for the DSP core (I1 and IO in the SR). If 
the requesting interrupt has higher priority than the current priority level of the DSP core, then exception 
processing begins. When exception processing begins, the interrupt control unit provides the address of the 
interrupt vector for interrupts generated on the DSP core, whereas the peripherals generate the vector 
address for interrupts generated by an on-chip peripheral. 


Interrupts have a simple priority structure with levels zero or one. Level 0 is the lowest interrupt priority 
level (IPL) and is maskable. Level 1 is the highest level and is not maskable. Two interrupt mask bits in the 
SR reflect the current IPL of the DSP core and indicate the level needed for an interrupt source to interrupt 
the processor. 


The DSP56800 core provides support for internal (on-chip) peripheral interrupts and two external interrupt 
sources, IRQA and IRQB. The interrupt control unit arbitrates between interrupt requests generated 
externally and by the on-chip peripherals. 


Asserting the reset pin causes the DSP core to enter the reset processing state. This has higher priority and 
overrides any activity in the interrupt control unit and the exception processing state. 
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Details of interrupt arbitration and the exception processing state are discussed in Section 7.3, “Exception 
Processing State,” on page 7-5. The reset processing state is discussed in Section 7.1, “Reset Processing 
State,” on page 7-1. 


5.1.4 Looping Control Unit 


The looping control unit provides hardware dedicated to support loops, which are frequent constructs in 
DSP algorithms. 


The repeat instruction (REP) loads the 13-bit LC register with a value representing the number of times the 
next instruction is to be repeated. The instruction to be repeated is only fetched once per loop, so power 
consumption is reduced, and throughput is increased when running from external program memory by 
decreasing the number of external fetches required. 


The DO instruction loads the 13-bit LC register with a value representing the number of times the loop 
should be executed, loads the LA register with the address of the last instruction word in the loop (fetched 
only once per loop), and sets the loop flag (LF) bit in the SR. The top-of-loop address is stacked on the 
HWS so the loop can be repeated with no overhead. When the LF in the SR is asserted, the loop state 
machine will compare the PC contents to the contents of the LA to determine if the last instruction word in 
the loop was fetched. If the last word was fetched, the LC contents are tested for one. If LC is not equal to 
one, then it is decremented, and the contents of the HWS (the address of the first instruction in the loop) 
are read into the PC, effectively executing an automatic branch to the top of the loop. If the LC is equal to 
one, then the LF in the SR is restored with the contents of the OMR’s nested looping (NL) bit, the 
top-of-loop address is removed from the HWS, and instruction fetches continue at the incremented PC 
value (LA + 1). 


Nested loops are supported by stacking the address of the first instruction in the loop (top of loop) in the 
HWS and copying the LF bit into the OMR’s NL bit prior to the execution of the first instruction in the 
loop. The user, however, must explicitly stack the LA and LC registers as described in Section 8.6.4, 
“Nested Loops,” on page 8-22. 


Looping is described in more detail in Section 5.3, “Program Looping,” and Section 8.6, “Loops,” on page 
8-20. 


5.1.5 Loop Counter 


The loop counter (LC) is a special 13-bit down counter used to specify the number of times to repeat a 
hardware program loop (DO and REP loops). When the end of a hardware program loop is reached, the 
contents of the loop counter register are tested for one. If the loop counter is one, the program loop is 
terminated. If the loop counter is not one, it is decremented by one and the program loop is repeated. 


The loop counter may be read and written under program control. This gives software programs access to 
the value of the current loop iteration. It also allows for saving and restoring the LC to and from the 
software stack when nesting DO loops in software. Note that since the LC is only a 13-bit counter, it is 
zero-extended when read; when written, the top three bits of the source word are ignored. This is shown in 
Figure 5-3 on page 5-5. 
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Figure 5-3. Accessing the Loop Count Register (LC) 


This register is not stacked by a DO instruction and not unstacked by end-of-loop processing, as is done on 
other Motorola DSPs. Section 5.3, “Program Looping,” discusses what occurs when the loop count is zero. 
See Section 8.6.4, “Nested Loops,” on page 8-22 for a discussion of nesting loops in software. 


The upper three bits of this register will read as zero during DSP read operations and should be written as 
zero to ensure future compatibility. 


5.1.6 Loop Address 


The loop address (LA) register indicates the location of the last instruction word in a hardware program 
loop (DO loop only). When the instruction word at the address contained in this register is fetched, the LC 
is checked. If it is not equal to one, the LC is decremented, and the next instruction is taken from the 
address at the top of the system stack; otherwise the PC is incremented, the LF is restored with the value in 
the OMR’s NL bit, one location from the Hardware Stack is purged, and instruction execution continues 
with the instruction immediately after the loop. 


The LA register is a read/write register written into by the DO instruction. The LA register can be directly 
accessed by the MOVE instructions as well. This also allows for saving and restoring the LA to and from 
the stack during the nesting of loops. This register is not stacked by a DO instruction and is not unstacked 
by end-of-loop processing. See Section 8.6.4, “Nested Loops,” on page 8-22 for a discussion of nesting 
loops in software. 
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5.1.7 Hardware Stack 


The hardware stack (HWS) is a 2-deep, 16-bit wide, last-in-first-out (LIFO) stack. It is used for supporting 
hardware DO looping; the software stack is used for storing return addresses and the SR for subroutines 
and interrupts. 


When a DO instruction is executed, the 16-bit address of the first instruction in the DO loop is pushed onto 
the hardware stack, the value of the LF bit is copied into the NL bit, and the LF bit is set. Each ENDDO 
instruction or natural end-of-loop will pop and discard the 16-bit address stored in the top location of the 
hardware stack, copy the NL bit into the LF bit, and clear the NL bit. One hardware stack location is used 
for each nested DO loop, and the REP instruction does not use the hardware stack. Thus, a two-deep 
hardware stack allows for a maximum of two nested DO loops and a nested REP loop within a program. 
Note that this includes any looping that may occur due to a DO loop in an interrupt service routine. 


When a write to the hardware stack would cause the stack limit to be exceeded, the write does not take 
place, and a non-maskable hardware-stack-overflow interrupt occurs. There is no interrupt on hardware 
stack underflow. 


5.1.8 Status Register 


The status register (SR) is a 16-bit register consisting of an 8-bit mode register (MR) and an 8-bit condition 
code register (CCR). The MR register is the high-order 8 bits of the SR; the CCR register is the low-order 
8 bits. 


The mode register is a special-purpose register that defines the operating state of the DSP core. It is 
conveniently located within the SR so that is it stacked correctly on an interrupt. This allows an interrupt 
service routine to set up the operating state of the DSP core differently. 


The mode register bits are affected by processor reset, exception processing, DO, ENDDO, any type of 
jump or branch, RTI, RTS, and SWI instructions, and instructions that directly reference the MR register. 
During processor reset, the interrupt mask bits of the mode register will be set, and the LF bit and program 
extension bits will be cleared. 


The condition code register is a special-purpose control register that defines the current status of the 
processor at any given time. Its bits are set as a result of status detected after certain instructions are 
executed. The CCR bits are affected by data ALU operations, bit-field manipulation instructions, the 
TSTW instruction, parallel move operations, and instructions that directly reference the CCR register. In 
addition, the computation of the C, V, N, and Z condition code bits are affected by the OMR’s CC bit, 
which specifies whether condition codes are generated using the information in the extension register. The 
CCR bits are not affected by data transfers over CGDB unless data limiting occurs when reading the A or 
B accumulators. During processor reset, all CCR bits are cleared. The standard definitions of the CCR bits 
are given in the following subsections, and more information about condition code bits is found in 
Section 3.6, “Condition Code Generation,” on page 3-33. Refer to Appendix A, “Instruction Set Details,“ 
for computation rules. 


The SR register is stacked on the software stack when a JSR is executed or when an interrupt occurs. The 
SR register is restored from the stack upon completion of an interrupt service routine by the 
return-from-interrupt instruction (RTI). The program extension bits in the SR are restored from the stack 
by the return-from-subroutine (RTS) instruction—all other SR bits are unaffected. 


The SR format is shown in Figure 5-4 on page 5-7 and is also described in the following subsections. 
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Mode Register (MR) Condition Code Register (CCR) 


SR 6 3 2 1 
Status Register 
Reset = $0300 
Read/Write 


LF—Loop Flag 

11 ,10O—Interrupt Mask 
SZ—Size 

L—Limit 
E—Extension 
U—Unnormalized 
N—Negative 
Z—Zero 
V—Overflow 
C—Carry 


* Indicates reserved bits that are read as zero and should be written with zero for future compatibility AAO011 


Figure 5-4. Status Register Format 


5.1.8.1 Carry (C)—Bit 0 


The carry (C) bit (SR bit 0) is set if a carry is generated out of the MSB of the result for an addition. It also 
is set if a borrow is generated in a subtraction. If the CC bit in the OMR register is zero, the carry or borrow 
is generated out of bit 35 of the result. If the CC bit in the OMR register is one, the carry or borrow is 
generated out of bit 31 of the result. The carry bit is also modified by bit manipulation and shift 
instructions. Otherwise, this bit is cleared. 


5.1.8.2 Overflow (V)—Bit 1 


If the CC bit in the OMR register is zero and if an arithmetic overflow occurs in the 36-bit result, the 
overflow (V) bit (SR bit 1) is set. If the CC bit in the OMR register is one and an arithmetic overflow 
occurs in the 32-bit result, the overflow bit is set. This indicates that the result is not representable in the 
accumulator register and the accumulator register has overflowed. Otherwise, this bit is cleared. 


5.1.8.3 Zero (Z)—Bit 2 


The zero (Z) bit (SR bit 2) is set if the result equals zero. Otherwise, this bit is cleared. The number of bits 
checked for the zero test depends on the OMR’s CC bit and which instruction is executed, as documented 
in Section 3.6, “Condition Code Generation,” on page 3-33. 


5.1.8.4 Negative (N)—Bit 3 


If the CC bit in the OMR register is zero and if bit 35 of the result is set, the negative (N) bit (SR bit 3) is 
set. If the CC bit in the OMR register is one and if bit 31 of the result is set, the negative bit is set. 
Otherwise, this bit is cleared. 
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5.1.8.5 Unnormalized (U)—Bit 4 


The unnormalized (U) bit (SR bit 4) is set if the two most significant bits of the most significant product 
portion of the result are the same, and is cleared otherwise. The U bit is computed as follows: U = (Bit 31 
XOR Bit 30). 


If the U bit is cleared, then a positive fractional number, p, satisfies the following relation: 0.5 <p < 1.0.A 
negative fractional number, n, it satisfies the following equation: -1.0 <n < -0.5. 


This bit is not affected by the OMR’s CC bit. 


5.1.8.6 Extension (E)—Bit 5 


The extension (E) bit (SR bit 5) is cleared if all the bits of the integer portion (bits 35-31) of the 36-bit 
result are the same (the upper five bits of the value are 00000 or 11111). Otherwise, this bit is set. 


If E is cleared, then the MS and LS portions of an accumulator contain all the bits with information—the 
extension register only contains sign extension. In this case, the accumulator extension register can be 
ignored. If E is set, then the extension register in the accumulator is in use. 


This bit is not affected by the OMR’s CC bit. 


5.1.8.7 Limit (L)—Bit 6 


The limit (L) bit (SR bit 6) is set if the overflow bit is set or if the data limiters perform a limiting 
operation; it is not affected otherwise. The L bit is cleared only by a processor reset or an instruction that 
specifically clears it. This allows the L bit to be used as a latching overflow bit. Note that L is affected by 
data movement operations that read the A or B accumulator registers onto the CGDB. 


This bit is not affected by the OMR’s CC bit. 


5.1.8.8 Size (SZ)—Bit 7 


The size (SZ) bit (SR bit 7) is set when moving a 36-bit accumulator to data memory if bits 30 and 29 of 
the source accumulator are not the same—that is, if they are not both ones or zeros. This bit is latched, so it 
will remain set until the processor is reset or an instruction explicitly clears it. 


By monitoring the SZ bit, it is possible to determine whether a value is growing to the point where it will 
be saturated or limited when moved to data memory. It is designed for use in the fast Fourier transform 
(FFT) algorithm, indicating that the next pass in the algorithm should scale its results before computation. 
This allows FFT data to be scaled only on passes where it is necessary instead of on each pass, which in 
turn helps guarantee maximum accuracy in an FFT calculation. 


5.1.8.9 Interrupt Mask (I1 and 10)—Bits 8-9 


The interrupt mask (I1 and IO) bits (SR bits 9 and 8) reflect the current priority level of the DSP core and 
indicate the interrupt priority level (IPL) needed for an interrupt source to interrupt the processor. The 
current priority level of the processor may be changed under software control. Interrupt mask bit IO must 
always be written with a one to ensure future compatibility and compatibility with other family members. 
The interrupt mask bits are set during processor reset. See Table 5-1 on page 5-9 for interrupt mask bit 
definitions. 
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Table 5-1. Interrupt Mask Bit Definition 


11 10 Exceptions Permitted Exceptions Masked 
0 0 (Reserved) (Reserved) 

0 1 IPL 0, 1 None 

1 0 (Reserved) (Reserved) 

1 1 IPL 1 IPLO 


5.1.8.10 Reserved SR Bits— Bits 10-14 


The reserved SR bits 10-14 are reserved for future expansion and will read as zero during DSP read 
operations. These bits should be written with zero for future compatibility. 


5.1.8.11 Loop Flag (LF)—Bit 15 


The loop flag (LF) bit (SR bit 15) is set when a program loop is in progress and enables the detection of the 
end of a program loop. The LF bit is the only SR bit that is restored when terminating a program loop. 
Stacking and restoring the LF when initiating and exiting a program loop, respectively, allows the nesting 
of program loops; see Section 5.1.9.7, “Nested Looping Bit (NL)—Bit 15.” REP looping does not affect 
this bit. The LF is cleared during processor reset. 


NOTE: 


The LF is not cleared at the start of an interrupt service routine. This differs 
from the DSP56100 Family, where this bit is cleared upon entering an 
interrupt service routine. This will not cause a problem as long as the 
interrupt service routine code does not fetch the instruction whose address 
is stored in the LA register. This is typically the case because usually the 
interrupt service routine is located in a separate portion of program 
memory. 


This bit should never be explicitly cleared by a MOVE or bit-field 
instruction when the NL bit in the OMR register is set to a one. 


The LF bit is also affected by any accesses to the hardware stack register. Any move instruction that writes 
this register copies the old contents of the LF bit into the NL bit and then sets the LF bit. Any reads of this 
register, such as from a MOVE or TSTW instruction, copy the NL bit into the LF bit and then clear the NL 
bit. 


5.1.9 Operating Mode Register 


The operating mode register (OMR) is a 16-bit register that defines the current chip operating mode of the 
processor. The OMR bits are affected by processor reset, operations on the HWS, and instructions that 
directly reference the OMR. A DO loop will also affect the OMR, specifically the NL bit. 


During processor reset, the chip operating mode bits will be loaded from the external mode select pins. The 
operating mode register format is shown in Figure 5-5 on page 5-10 and is described in the subsequent 
discussion. 
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NOTE: 


When a bit of the OMR is changed by an instruction, a delay of one 
instruction cycle is necessary before the new mode comes into effect. 


OMR 

Operating Mode 

Register 

Reset = $0000 

Read/Write 
eee ee 
NL—Nested Looping 
CC—Condition Codes 
SD—Stop Delay 
R—Rounding 
SA—Saturation 
EX—External X Memory 
MA,MB—Operating Mode 

* Indicates reserved bits that are read as zero and should be written with zero for future compatibility AA0013 


Figure 5-5. Operating Mode Register (OMR) Format 


5.1.9.1 Operating Mode Bits (MB and MA)—Bits 1-0 


The chip operating mode (MB and MA) bits (OMR bits 1 and 0) indicate the operating mode and memory 
maps of a DSP chip that has an external bus. Possible operating modes for a program RAM part are shown 
in Table 5-2. 


Table 5-2. Program ROM Operating Modes 


Program Memory 


MB MA Chip Operating Mode Reset Vector Configuration 
0 0 Bootstrap 0 BOOTROM P:$0000 Internal P-RAM is write only 
(Boot from External Bus) 
0 1 Bootstrap 1 BOOTROM P:$0000 Internal P-RAM is write only 
(Boot from Peripheral) 
1 0 Normal Expanded External Pmem P:$E000 Internal Pmem enabled 
1 1 Development External Pmem P:$0000 Internal Pmem disabled 


The exact implementation of the mode bits, and the number of modes supported, depends on the specific 
DSP56800 Family device being used. See the appropriate user’s manual for more detailed information on 
the operating modes. 


The bootstrap modes are used to initially load an on-chip program RAM upon exiting reset from external 
memory or through a peripheral. Operating modes 0 and 1 typically would be different for a program ROM 
part because no bootstrapping operation is required for a ROM part. An example of possible operating 
modes for a program ROM part are shown in Table 5-3 on page 5-11. 
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Table 5-3. Program RAM Operating Modes 


MB MA Chip Operating Mode Reset Vector A area 
0 0 Single Chip Internal PROM P:$0000 Internal Pmem enabled 
0 1 (Reserved) (Reserved) (Reserved) 
1 0 Normal Expanded External Pmem P:$E000 Internal Pmem enabled 
1 1 Development External Pmem P:$0000 Internal Pmem disabled 


The MB and MA bit values are typically established on reset from an external input. Once the chip leaves 
reset, they can be changed under software control. For more information about how they are configured on 
reset, consult the appropriate device’s user’s manual. 


5.1.9.2 External X Memory Bit (EX)—Bit 3 


The external X memory (EX) bit (OMR bit 3), when set, forces all primary data memory accesses to be 
external. The only exception to this rule is that ifa MOVE or bit-field instruction is executed using the I/O 
short addressing mode, then the EX bit is ignored, and the access is performed to the on-chip location. The 
EX bit allows access to internal X memory with all addressing modes when this bit is cleared. This bit is 
cleared by processor reset. 


The EX bit is ignored by the second read of a dual-read instruction, which uses the XAB2 and XDB2 buses 
and always accesses on-chip X data memory. For instructions with two parallel reads, the second read is 
always performed to internal on-chip memory. Refer to Section 6.1, “Introduction to Moves and Parallel 
Moves,” on page 6-1 for a description of the dual-read instructions. 


5.1.9.3 Saturation (SA)—Bit 4 


The Saturation (SA) bit enables automatic saturation on 32-bit arithmetic results, providing a user-enabled 
Saturation mode for DSP algorithms that do not recognize or cannot take advantage of the extension 
accumulator. When the SA bit is set, automatic saturation occurs at the output of the MAC unit for basic 
arithmetic operations such as multiplication, addition, and so on. The SA bit is cleared by processor reset. 
Automatic limiting as outlined in Section 3.4.1, “Data Limiter,” on page 3-26 is not affected by the state of 
the SA bit. 


Saturation is performed by a dedicated circuit inside the MAC unit. The saturation logic operates by 
checking 3 bits of the 36-bit result out of the MAC unit—EXT[3], EXT[O], and MSP[15]. When the SA bit 
is set, these 3 bits determine if saturation is performed on the MAC unit’s output and whether to saturate to 
the maximum positive or negative value, as shown in Table 5-4. 


Table 5-4. MAC Unit Outputs With Saturation Mode Enabled (SA = 1) 


EXT[3] EXT[0] MSP[15] Result Stored in Accumulator 
0 0 0 (Unchanged) 
0 0 1 $0 7FFF FFFF 
0 1 0 $0 7FFF FFFF 
0 1 1 $0 7FFF FFFF 
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Table 5-4. MAC Unit Outputs With Saturation Mode Enabled (SA = 1) (Continued) 


EXT[3] EXT[0] MSP[15] Result Stored in Accumulator 
1 0 0 $F 8000 0000 
1 0 1 $F 8000 0000 
1 1 0 $F 8000 0000 
1 1 1 (Unchanged) 
NOTE: 


Saturation mode is always disabled during the execution of the following 
instructions: ASLL, ASRR, LSLL, LSRR, ASRAC, LSRAC, IMPY16, 
MPYSU, MACSU, AND, OR, EOR, NOT, LSL, LSR, ROL, and ROR. 
For these instructions, no saturation is performed at the output of the MAC 
unit. 


5.1.9.4 Rounding Bit (R)—Bit 5 


The rounding (R) bit (OMR bit 5) selects between convergent rounding and two’s-complement rounding. 
When set, two’s-complement rounding (always round up) is used. The two rounding modes are discussed 
in Section 3.5, “Rounding,” on page 3-30. This bit is cleared by processor reset. 


5.1.9.5 Stop Delay Bit (SD)—Bit 6 


The stop delay (SD) bit (OMR bit 6) is used to select the delay that the DSP needs to exit the stop mode. 
When set, the processor exits quickly from stop mode. This bit is cleared by processor reset. 


5.1.9.6 Condition Code Bit (CC)—Bit 8 


The condition code (CC) bit (OMR bit 8) selects whether condition codes are generated using a 36-bit 
result from the MAC array or a 32-bit result. When this bit is set, the C, N, V, and Z condition codes are 
generated based on bit 31 of the data ALU result. When this bit is cleared, the C, N, V, and Z condition 
codes are generated based on bit 35 of the data ALU result. The generation of the L, E, and U condition 
codes are not affected by the CC bit. This bit is cleared by processor reset. 


NOTE: 


The unsigned condition tests used when branching or jumping (HI, HS, 
LO, and LS) can only be used when the condition codes are generated with 
this bit set to one. Otherwise, the chip will not generate the unsigned 
conditions correctly. 


The effects of the CC bit on the condition codes generated by data ALU arithmetic operations are 
discussed in more detail in Section 3.6, “Condition Code Generation,” on page 3-33. 
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5.1.9.7 Nested Looping Bit (NL)—Bit 15 


The nested looping (NL) bit (OMR bit 15) is used to display the status of program DO looping and the 
hardware stack. If this bit is set, then the program is currently in a nested DO loop (that is, two DO loops 
are active). If this bit is cleared, then there may be a single or no DO loop active. This bit is necessary for 
saving and restoring the contents of the hardware stack, which is described further in Section 8.13, 
“Multitasking and the Hardware Stack,” on page 8-34. REP looping does not affect this bit. 


It is important that the user never put the processor in the illegal combination specified in Table 5-5. This 
can be avoided by ensuring that the LF bit is never cleared when the NL bit is set. 


The NL bit is cleared on processor reset. Also see Section 5.1.8.11, “Loop Flag (LF)—Bit 15,” which 
discusses the LF bit in the SR. 


Table 5-5. Looping Status 


NL LF DO Loop Status 
0 0 No DO loops active 
0 1 Single DO loop active 
1 0 (Illegal combination) 
1 1 Two DO loops active 


If both the NL and LF bits are set (that is, two DO loops are active) and a DO instruction is executed, a 
hardware-stack-overflow interrupt occurs because there is no more space on the hardware stack to support 
a third DO loop. 


The NL bit is also affected by any accesses to the hardware stack register. Any MOVE instruction that 
writes this register copies the old contents of the LF bit into the NL bit and then sets the LF bit. Any reads 
of this register, such as from a MOVE or TSTW instruction, copy the NL bit into the LF bit and then clear 
the NL bit. 


5.1.9.8 Reserved OMR Bits—Bits 2, 7 and 9-14 


The OMR bits 2, 7, and 9-14 are reserved. They will read as zero during DSP read operations and should 
be written as zero to ensure future compatibility. 


5.2 Software Stack Operation 


The software stack is a last-in-first-out (LIFO) stack of arbitrary depth implemented using memory 
locations in the X data memory. It is accessed through the POP instruction and the PUSH instruction 
macro (see Section 8.5, “Multiple Value Pushes,” on page 8-19) and will read or write the location in the X 
data memory pointed to by the stack pointer (SP) register. The PUSH instruction macro (two instruction 
cycles) pre-increments the SP register, and the POP instruction (one instruction cycle) will post-decrement 
the SP register. 


The program counter and the SR are pushed on this stack for subroutine calls and interrupts. These 
registers are pulled from the stack for returns from subroutines using the RTS instruction (which restores 
only the program extension bits in SR), and for returns from interrupt service routines that use the RTI 
instruction (the entire SR is restored from the stack). 
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The software stack is also used for nesting hardware DO loops in software on the DSP56800 architecture. 
On the DSP56800 architecture, the user must stack and unstack the LA and LC registers explicitly if DO 
loops are nested. In this case, the software stack is typically used for this purpose, as demonstrated in 
Section 8.6.4, “Nested Loops,” on page 8-22. The hardware stack is used, however, for stacking the 
address of the first instruction in the loop. Because this stack is implemented using locations in the X data 
memory, there is no limit to the number of interrupts or jump-to subroutines or combinations of these that 
can be accommodated by this stack. 


NOTE: 


Care must be taken to allocate enough space in the X data memory so that 
stack operations do not overlap other areas of data used by the program. 
Similarly, it may be desirable to locate the stack in on-chip memory to 
avoid delays due to wait states or bus arbitration. 


See Section 8.5, “Multiple Value Pushes,” on page 8-19 and Section 8.8, “Parameters and Local 
Variables,” on page 8-28 for recommended techniques for using the software stack. 


5.3 Program Looping 


The DSP core supports looping on a single instruction (REP looping) and looping on a block of 
instructions (DO looping). Hardware DO looping allows fast looping on a block of instructions and is 
interruptible. Once the loop is set up with the DO instruction, there is no additional execution time to 
perform the looping tasks. REP looping repeats a one-word instruction for the specified number of times 
and can be efficiently nested within a hardware DO loop. It allows for excellent code density because 
blocks of in-line code of a single instruction can be replaced with a one-word REP instruction followed by 
the instruction to be repeated. The correct programming of loops is discussed in detail in Section 8.6, 
“Loops,” on page 8-20. 


5.3.1 Repeat (REP) Looping 


The REP instruction is a one-word instruction that performs single-instruction repeating on one-word 
instructions. It repeats the execution of a single instruction for the amount of times specified either with a 
6-bit unsigned value or with the 13 least significant bits of a DSP core register. When a repeat loop is 
begun, the instruction to be repeated is only fetched once from the program memory; it is not fetched each 
time the repeated instruction is executed. Repeat looping does not use any locations on the hardware stack. 
It also has no effect on the LF or NL bits in the SR and OMR, respectively. Repeat looping cannot be used 
on an instruction that accesses the program memory; it is necessary to use DO looping in this case. 


NOTE: 


REP loops are not interruptible since they are fetched only once. A DO 
loop with a single instruction can be used in place of a REP instruction if 
it is necessary to be able to interrupt while the loop is in progress. 


For the case of REP looping with a register value, when the register 
contains the value zero, then the instruction to be repeated is not executed 
(as is desired in an application), and instruction flow continues with the 
next sequential instruction. This is also true when an immediate value of 
zero is specified. 
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5.3.2 DO Looping 


The DO instruction is a two-word instruction that performs hardware looping on a block of instructions. It 
executes this block of instructions for the amount of times specified either with a 6-bit unsigned value or 
using the 13 least significant bits of a DSP core register. DO looping is interruptible and uses one location 
on the hardware stack for each DO loop. For cases where an immediate value larger than 63 is desired for 
the loop count, it is possible to use the technique presented in Section 8.6.1, “Large Loops (Count Greater 
Than 63),” on page 8-20. 


The program controller register’s 13-bit loop count and 16-bit loop address register are used to implement 
no-overhead hardware program loops. When a program loop is initiated with the execution of a DO 
instruction, the following events occur: 


1. The LC and LA registers are loaded with values specified in the DO instruction. 
2. The SR’s LF bit is set, and its old value is placed in the NL bit. 
3. The address of the first instruction in the program loop is pushed onto the hardware stack. 


A program loop begins execution after the DO instruction and continues until the program address fetched 
equals the loop address register contents (the last address of program loop). The contents of the loop 
counter are then tested for one. If the loop counter is not equal to one, the loop counter is decremented and 
the top location in the DO Loop Stack is read (but not pulled) into the PC to return to the top of the loop. If 
the loop counter is equal to one, the program loop is terminated by incrementing the PC, purging the stack 
(pulling the top location and discarding the contents), and continuing with the instruction immediately 
after the last instruction in the loop. 


NOTE: 
For the case of DO looping with a register value, when the register contains 


the value zero, then the loop code is repeated 2k times, where k = 13 is the 
number of bits in the LC register. If there is a possibility that a register 
value may be less than or equal to zero, then the technique outlined in 
Section 8.6.2, “Variable Count Loops,” on page 8-21 should be used. A 
DO loop with an immediate value of zero is not allowed. 


5.3.3 Nested Hardware DO and REP Looping 


It is possible to nest up to two hardware DO loops and to nest a hardware REP loop within the two DO 
loops. It is recommended when nesting loops, however, that hardware DO loops not be nested within code. 
Instead, a software loop should be used for an outer loop instead of a second DO loop (see Section 8.6.4, 
“Nested Loops,” on page 8-22). 


The reason that nesting of hardware DO loops is supported is to provide for faster interrupt servicing. 
When hardware DO loops are not nested, a second hardware stack location is left available for immediate 
use by an interrupt service routine. 


5.3.4 Terminating a DO Loop 


A DO loop normally terminates when it has completed the last instruction of a loop for the last iteration of 
the loop (LC equals one). Two techniques for early termination of the DO loops are presented in 
Section 8.6.6, “Early Termination of a DO Loop,” on page 8-25. 
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Chapter 6 
Instruction Set Introduction 


As indicated by the programming model in Figure 6-3 on page 6-5, the DSP architecture can be viewed as 
several functional units operating in parallel: 


¢ Data ALU 

« AGU 

¢ Program controller 

e  Bit-manipulation unit 


The goal of the instruction set is to keep each of these units busy each instruction cycle. This achieves 
maximum speed, minimum power consumption, and minimum use of program memory. 


The complete range of instruction capabilities combined with the flexible addressing modes provide a very 
powerful assembly language for digital-signal-processing algorithms and general-purpose computing. 
(The addressing modes are presented in detail in Section 4.2, “Addressing Modes,” on page 4-6.) The 
instruction set has also been designed to allow for the efficient coding of DSP algorithms, control code, 
and high-level language compilers. Execution time is enhanced by the hardware looping capabilities. 


This section introduces the MOVE instructions available on the DSP core, the concept of parallel moves, 
the DSP instruction formats, the DSP core programming model, instruction set groups, a summary of the 
instruction set in tabular form, and an introduction to the instruction pipeline. The instruction summary is 
particularly useful because it shows not only every instruction but also the operands and addressing modes 
allowed for each instruction. 


6.1 Introduction to Moves and Parallel Moves 


To simplify programming, a powerful set of MOVE instructions is found on the DSP56800 core. This not 
only eases the task of programming the DSP, but also decreases the program code size and improves the 
efficiency, which in turn decreases the power consumption and MIPs required to perform a given task. 
Some examples of MOVE instructions are listed in Example 6-1. 


Example 6-1. MOVE Instruction Types 


MOVE <any_DSPcore_register>,<any_DSPcore_register> 

MOVE <any_DSPcore_register>,<X_Data_Memory> 

MOVE <any_DSPcore_register>,<On_chip_peripheral_register> 
MOVE <X_Data_Memory>, <any_DSPcore_register> 

MOVE <On_chip_peripheral_register>,<any_DSPcore_register> 
MOVE <immediate_value>,<any_DSPcore_register> 

MOVE <immediate_value>, <X_Data_Memory> 

MOVE <immediate_value>, <On_chip_peripheral_register> 
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For any MOVE instruction accessing X data memory or an on-chip memory mapped peripheral register, 
seven different addressing modes are supported. Additional addressing modes are available on the subset 
of DSP core registers that are most frequently accessed, including the registers in the data ALU, and all 
pointers in the address generation unit. 


For all moves on the DSP56800, the syntax orders the source and destination as follows: SRC, DST. The 
source of the data to be moved and the destination are separated by a comma, with no spaces either before 
or after the comma. 


The assembler syntax also specifies which memory is being accessed (program or data memory) on any 
memory move. Table 6-1 shows the syntax for specifying the correct memory space for any memory 
access; an example of a program memory access is shown where the address is contained in the register R2 
and the address register is post-incremented after the access. The two examples for X data memory 
accesses show an address-register-indirect addressing mode in the first example and an absolute address in 
the second. 


Table 6-1. Memory Space Symbols 


Symbol Examples Description 
P: P:(R2)+ Program memory access 
x: X:(RO) X data memory access 
X:$C000 


The DSP56800 instruction set supports two additional types of moves—the single parallel move and the 
dual parallel read. Both of these are considered “parallel moves” and are extremely powerful for DSP 
algorithms and numeric computation. 


The single parallel move allows an arithmetic operation and one memory move to be completed with one 
instruction in one instruction cycle. For example, it is possible to add two numbers while reading or 
writing a value from memory in the same instruction. 


Figure 6-1 illustrates a single parallel move, which uses one program word and executes in one instruction 
cycle. 


ADD X0,A YO,X: (R1)+N ; One DSP56800 Instruction 
Ld Ld 
| 
Opcode And Operands Single Parallel Move 


(Uses XAB1 and CGDB) 


Figure 6-1. Single Parallel Move 


In the single parallel move, the following occurs: 
1. Register XO is added to the register A and the result is stored in the A accumulator. 


2. The contents of the YO register are moved into the X data memory at the location contained 
in the R1 register. 


3. After completing the memory move, the R1 register is post-updated with the contents of the 
N register. 


The dual parallel read allows an arithmetic operation to occur and two values to be read from X data 
memory with one instruction in one instruction cycle. For example, it is possible to execute in the same 
instruction a multiplication of two numbers, with or without rounding of the result, while reading two 
values from X data memory to two of the data ALU registers. 
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Figure 6-2 illustrates a double parallel move, which uses one program word and executes in one instruction 
cycle. 


MACR X0,Y0,A X: (RO) +N, YO X: (R3)-,X0 
| 
| 
Opcode and Operands Primary Read Secondary Read 
(Uses XAB1 and CGDB) (Uses XAB2 and XDB2) 


Figure 6-2. Dual Parallel Move 


In the dual parallel move, the following occurs. 


1. The contents of the XO and YO registers are multiplied, this result is added to the A 
accumulator, and the final result is stored in the A accumulator. 


2. The contents of the X data memory location pointed to with the RO register are moved into 
the YO register. 


3. The contents of the X data memory location pointed to with the R3 register are moved into 
the XO register. 


4. After completing the memory moves, the RO register is post-updated with the contents of 
the N register, and the R3 register is decremented by one. 


Both types of parallel moves use a subset of available DSP56800 addressing modes, and the registers 
available for the move portion of the instruction are also a subset of the total set of DSP core registers. 
These subsets include the registers and addressing modes most frequently found in high-performance 
numeric computation and DSP algorithms. Also, the parallel moves allow a move to occur only with an 
arithmetic operation in the data ALU. A parallel move is not permitted, for example, with a JMP, LEA, or 
BFSET instruction. 


6.2 Instruction Formats 


Instructions are one, two, or three words in length. The instruction is specified by the first word of the 
instruction. The additional words may contain information about the instruction itself or may contain an 
operand for the instruction. Samples of assembly language source code for several instructions are shown 
in Table 6-2. 


From the instruction formats listed in Table 6-2, it can be seen that the DSP offers parallel processing using 
the data ALU, AGU, program controller, and bit-manipulation unit. In the parallel move example, the DSP 
can perform a designated ALU operation (data ALU) and up to two data transfers specified with address 
register updates (AGU), and will also decode the next instruction and fetch an instruction from program 
memory (program controller), all in one instruction cycle. When an instruction is more than one word in 
length, an additional instruction-execution cycle is required. Most instructions involving the data ALU are 
register based (that is, operands are in data ALU registers) and allow the programmer to keep each parallel 
processing unit busy. Instructions that are memory oriented (for example, a bit-manipulation instruction), 
all logical instructions, or instructions that cause a control flow change (such as a jump) prevent the use of 
all parallel processing resources during their execution. 
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Table 6-2. Instruction Formats 
Opcode! Operands? wus 3 eae 4 a 5 Comments 
Transfer Transfer Transfer 
ADD #$1234,Y1 No parallel move 
ANDC #$7C,X:$E27 No parallel move 
ENDDO No parallel move 
TSTW X:(SP-9) No parallel move 
MAC A1,Y0,B No parallel move 
LEA (R2)- No parallel move 
MOVE RO,YO No parallel move 
CMP X0,B YO,X:(R2)+ Single parallel move 
NEG A X:(R1)+N,X0 Single parallel move 
SUB Y1,A X:(RO)+, YO X:(R3)+,X0 Dual parallel read 
MPY X1,Y0,B X:(R1)+N,Y1 X:(R3)+,X0 Dual parallel read 
MACR X0,Y0,A X:(R1)+N,YO X:(R3)-,X0 Dual parallel read 
MOVE X0,P:(R1)+ | Program memory move 
JMP $3010 16-bit jump address 
1. Indicates data ALU, AGU, program controller, or bit-manipulation operation to be performed. 
2. Specifies the operands used by the opcode. 
3. Specifies optional data transfers over the CGDB bus. 
4. Specifies optional data transfers over the XDB2 bus. 
5. Specifies optional data transfers over the PDB bus. 
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6.3 Programming Model 


The registers in the DSP56800 core programming model are shown in Figure 6-3. 


Programming Model 


Data Arithmetic Logic Unit 
Data ALU Input Registers 


31 16 15 
15 0 15 0 15 
Accumulator Registers 
35 32 31 16 15 0 


Hardware Stack (HWS) Loop Counter 


0 
M01 
Pointer Offset Modifier 
Registers Register Register 
Program Controller Unit 
15 0 15 8 7 0 15 0 
Program Status Operating Mode 

Counter Register (SR) Register 

12 0 15 0 


Loop Address 


Figure 6-3. DSP56800 Core Programming Model 
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6.4 Instruction Groups 


The instruction set is divided into the following groups: 


e = Arithmetic 


¢« Logical 

¢ Bit manipulation 
¢« Looping 

¢ Move 


e Program control 


Each instruction group is described in the following subsections. In addition, Section 6.5.2, “LSLL Alias,” 
includes a useful summary for every instruction and the addressing modes and operand registers allowed 
for each instruction. Detailed information on each instruction is given in Appendix A, “Instruction Set 
Details.” 


6.4.1 Arithmetic Instructions 


The arithmetic instructions perform all of the arithmetic operations within the data ALU. They may affect 
a subset or all of the condition code register bits. Arithmetic instructions are typically register based 
(register-direct addressing modes are used for operands) so that the data ALU operation indicated by the 
instruction does not use the CGDB or the XDB2, although some instructions can also operate on 
immediate data or operands in memory. 


Optional data transfers (parallel moves) may be specified with many arithmetic instructions. This allows 
for parallel data movement over the CGDB and over the XDB2 during a data ALU operation. This allows 
new data to be pre-fetched for use in following instructions and results calculated by previous instructions 
to be stored. Arithmetic instructions typically execute in one instruction cycle, although some of the 
operations may take additional cycles with different operand addressing modes. The arithmetic 
instructions are the only class of instructions that allow parallel moves. 


In addition to the arithmetic shifts presented here, other types of shifts are also available in the logical 
instruction group. See Section 6.4.2, “Logical Instructions.” Table 6-3 lists the arithmetic instructions. 


Table 6-3. Arithmetic Instructions List 


Instruction Description 
ABS Absolute value 
ADC Add long with carry’ 
ADD Add 
ASL Arithmetic shift left (36-bit) 
ASLL Arithmetic multi-bit shift left! 
ASR Arithmetic shift right (36-bit) 
ASRAC Arithmetic multi-bit shift right with accumulate’ 
ASRR Arithmetic multi-bit shift right’ 
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Table 6-3. Arithmetic Instructions List (Continued) 
Instruction Description 
CLR Clear 
CMP Compare 
DEC(W) Decrement upper word of accumulator 
DIV Divide iteration’ 
IMPY(16) Integer multiply’ 
INC(W) Increment upper word of accumulator 
MAC Signed multiply-accumulate 
MACR Signed multiply-accumulate and round 
MACSU Signed/unsigned multiply-accumulate! 
MPY Signed multiply 
MPYR Signed multiply and round 
MPYSU Signed/unsigned multiply! 
NEG Negate 
NORM Normalize! 
RND Round 
SBC Subtract long with carry! 
SUB Subtract 
Tec Transfer conditionally’ 
TFR Transfer data ALU register to an accumulator 
TST Test a 36-bit accumulator 
TSTW Test a 16-bit register or memory location’ 


1. These instructions do not allow parallel data moves. 


6.4.2 Logical Instructions 


The logical instructions perform all of the logical operations within the data ALU. They also affect the 
condition code register bits. Logical instructions are register based. So are the arithmetic instructions in 
Table 6-3, and, again, some can also operate on operands in memory. Optional data transfers are not 
permitted with logical instructions. These instructions execute in one instruction cycle. 


Table 6-4 lists the logical instructions. 
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Table 6-4. Logical Instructions List 


Instruction Description 

AND Logical AND 
EOR Logical exclusive OR 
LSL Logical shift left 

LSLL Multi-bit logical shift left 

LSRAC Logical right shift with accumulate 
LSR Logical shift right 

LSRR Multi-bit logical shift right 
NOT Logical complement 
OR Logical inclusive OR 
ROL Rotate left 

ROR Rotate right 


6.4.3 Bit-Manipulation Instructions 


The bit-manipulation instructions perform one of three tasks: 


¢ ‘Testing a field of bits within a word 
¢ Testing and modifying a field of bits in a word 
¢ Conditionally branching based on a test of bits within the upper or lower byte of a word 


Bit-field instructions can operate on any X memory location, peripheral, or DSP core register. BFTSTH 
and BFTSTL can test any field of the bits within a 16-bit word. BFSET, BFCLR, and BFCHG can test any 
field of the bits within a 16-bit word and then set, clear, or invert bits in this word, respectively. BRSET 
and BRCLR can only test an 8-bit field in the upper or lower byte of the word, and then conditionally 
branch based on the result of the test. The carry bit of the condition code register contains the result of the 
bit test for each instruction. These instructions are operations of the read-modify-write type. The BFTSTH, 
BFTSTL, BFSET, BFCLR, and BFCHG instructions execute in two or three instruction cycles. The 
BRCLR and BRSET instructions execute in four to six instruction cycles. 


Table 6-5 lists the bit-manipulation instructions. 


Table 6-5. Bit-Field Instruction List 


Instruction Description 
ANDC Logical AND with immediate data 
BFCLR Bit-field test and clear 
BFSET Bit-field test and set 
BFCHG Bit-field test and change 

BFTSTL Bit-field test low 
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Table 6-5. Bit-Field Instruction List (Continued) 


Instruction Description 

BFTSTH Bit-field test high 

BRSET Branch if selected bits are set 

BRCLR Branch if selected bits are clear 
EORC Logical exclusive OR with immediate data 
NOTC Logical complement on memory location and registers 
ORC Logical inclusive OR with immediate data 

NOTE: 


Due to instruction pipelining, if an AGU register (Rn, N, SP, or MO1) is 
directly changed with a bit-field instruction, the new contents may not be 
available for use until the second following instruction (see the restrictions 
discussed in Section 4.4, “Pipeline Dependencies,” on page 4-33). 


See Section 8.1.1, “Jumps and Branches,” on page 8-2 for other instructions that can be synthesized. 


6.4.4 Looping Instructions 


The looping instructions establish looping parameters and initiate zero-overhead program looping. They 
allow looping on a single instruction (REP) or a block of instructions (DO). For DO looping, the address of 
the first instruction in the program loop is saved on the hardware stack to allow no-overhead looping. The 
last address of the DO loop is specified as a 16-bit absolute address. No locations in the hardware stack are 
required for the REP instruction. The ENDDO instruction is used only when breaking out of the loop; 
otherwise, it is better to use MOVE #1,LC. This is discussed in more detail in Section 8.6.6, “Early 
Termination of a DO Loop,” on page 8-25. 


Table 6-6 lists the loop instructions. 


Table 6-6. Loop Instruction List 


Instruction Description 
DO Start hardware loop 
ENDDO Disable current loop and unstack parameters 
REP Repeat next instruction 


6.4.5 Move Instructions 


The move instructions move data over the various data buses: CGDB, PGDB, XDB2, and PDB. Move 
instructions do not affect the condition code register, except for the limit bit if limiting is performed when 
reading a data ALU accumulator register. These instructions do not allow optional data transfers. In 
addition to the following move instructions, there are parallel moves that can be used simultaneously with 
many of the arithmetic instructions. The parallel moves are shown in Table 6-34 on page 6-29 and 
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Table 6-35 on page 6-30 and are discussed in detail in Section 6.1, “Introduction to Moves and Parallel 


Moves,” and Appendix A, “Instruction Set Details.” The LEA instruction is also included in this 
instruction group. 
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There is a PUSH instruction macro, described in Section 8.5, “Multiple 
Value Pushes,” on page 8-19, that can be used with the POP instruction 


presented here. 


Table 6-7 lists the move instructions. 


Table 6-7. Move Instruction List 


Instruction Description 
LEA Load effective address 
POP Pop a register from the software stack 
MOVE Move data 
MOVE(C) Move control register 
MOVE(I) Move immediate 
MOVE(M) Move program memory 
MOVE (P) Move peripheral data 
MOVE(S) Move absolute short 
NOTE: 


Due to instruction pipelining, if an AGU register (Rn, SP, or MO1) is 
directly changed with a move instruction, the new contents may not be 
available for use until the second following instruction. See the restrictions 


discussed in Section 4.4, “Pipeline Dependencies,” on page 4-33. 


6.4.6 Program Control Instructions 


The program control instructions include branches, jumps, conditional branches, conditional jumps, and 
other instructions that affect the program counter and software stack. Program control instructions may 
affect the status register bits as specified in the instruction. Also included in this instruction group are the 
STOP and WAIT instructions that can place the DSP chip in a low-power state. See Section 8.1.1, “Jumps 
and Branches,” on page 8-2 and Section 8.11, “Jumps and JSRs Using a Register Value,” on page 8-33 for 
additional jump and branch instructions that can be synthesized from existing DSP56800 instructions. 


Table 6-8 lists the program control instructions. 


Table 6-8. Program Conirol Instruction List 


Instruction 


Description 


Bcc 


Branch conditionally 


BRA 


Branch 


DEBUG 


Enter debug mode 


Jcc 


Jump conditionally 
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Table 6-8. Program Conirol Instruction List (Continued) 


Instruction Description 
JMP Jump 
JSR Jump to subroutine 
NOP No operation 
RTI Return from interrupt 
RTS Return from subroutine 
STOP Stop processing (lowest power standby) 
SWI Software interrupt 
WAIT Wait for interrupt (low power standby) 


6.5 Instruction Aliases 


The DSP56800 assembler provides a number of additional useful instruction mnemonics that are actually 
aliases to other instructions. Each of these instructions is mapped to one of the core instructions and 
disassembles as such. 


6.5.1 ANDC, EORC, ORC, and NOTC Aliases 


The DSP56800 instruction set does not support logical operations using 16-bit immediate data. It is 
possible to achieve the same result, however, using the bit-manipulation instructions. To simplify 
implementing these operations, the DSP56800 assembler provides the following operations: 


« ANDC—logically AND a 16-bit immediate value with a destination 

¢ EORC—logically exclusive OR a 16-bit immediate value with a destination 
¢ ORC—logically OR a 16-bit immediate value with a destination 

¢ NOTC—logical one’s-complement of a 16-bit destination 


These operations are not new instructions, but aliases to existing bit-manipulation instructions. They are 
mapped as shown in Table 6-9. 


Table 6-9. Aliases for Logical Instructions with Immediate Data 


instruction | OPeads || nctruetion | OPerands 
ANDC #xxxx,DST BFCLR #xxxx,DST 
ORC #XXxx, DST BFSET #XXxx,DST 
EORC #xxxx, DST BFCHG #xxxx,DST 
NOTC DST BFCHG #$FFFF,DST 
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Note that for the ANDC instruction, a one’s-complement of the mask value is used when remapping to the 
BFCLR instruction. For the NOTC instruction, all bits in the 16-bit mask are set to one. 
In Example 6-2, an immediate value is logically ORed with a location in memory. 


Example 6-2. Logical OR with a Data Memory Location 
ORC #SOOFF,X:$400; Set all bits of lower byte in X:S400 


The assembler translates this instruction into BFSET #SQOFF,X:$400, which performs the same 
operation. If the assembled code is later disassembled, it will appear as a BFSET instruction. 


6.5.2 LSLL Alias 


Because the LSLL instruction operates identically to an arithmetic left shift, this instruction is actually 
assembled as an ASLL instruction. When the assembler encounters the LSLL mnemonic, an ASLL 
instruction is assembled. See Table 6-10. 


Table 6-10. LSLL Instruction Alias 


Operation Operands Commenis 
LSLL Y1,X0,DD Multi-bit logical left shift. 
Y0,X0,DD 


Y1,Y0,DD First register is the value to be shifted, second register is 
Y0,Y0,DD the shift amount (uses 4 LSBs). 

A1,Y0,DD 
B1,Y1,DD Use ASLL when left shifting is desired on one of the two 
accumulators. 


6.5.3 ASL Alias 


Because the ASL instruction operates similarly to a logical left shift when executed on the Y1, YO, and XO 
registers, this instruction is actually assembled as an LSL instruction. Note that while the result in the 
destination register will be the same as if an arithmetic shift had been performed, condition codes are 
calculated based on a logic shift and might differ from the expected result. See Table 6-11. 


The ASL instruction is not aliased to LSL when the register specified is one of the accumulator registers. 


Table 6-11. ASL Instruction Remapping 


Operation Operands Comments 


ASL XO, YO, Y1 Arithmetic left shift 


6.5.4 CLR Alias 


Because CLR operates identically to a MOVE instruction with an immediate value of zero, a MOVE 
instruction is used to implement CLR when the specified register is a 16-bit register. When the assembler 
encounters the CLR mnemonic in a program, it assembles a MOVE #0,<register> instruction in its 
place. See Table 6-12. 


NOTE: 


This operation does not apply to the CLR instruction when it is performed 
on the A or B accumulators. 
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Table 6-12. Clear Instruction Alias 


Operation Destination Comments 
CLR XO, Y1, YO, IdenticaltoMOVE #0,<register>; does notset condition 
Ai, B1, codes 
RO-R3, N 


6.5.5 POP Alias 


The POP instruction operates identically to a move from the stack with post-decrement. When the 
assembler encounters the POP instruction in a program, it assembles aMOVE (SP)-,<register> 
instruction in its place. If POP does not specify a destination register, it is assembled as LEA (SP) -. 


Table 6-13. Move Word Instruction Alias—Data Memory 


Operation Source Destination Commenis 


POP (Any register) Pop a single stack location 


(None specified) | Simply decrements the SP 


6.6 DSP56800 Instruction Set Summary 


This section presents the entire DSP56800 instruction set in tabular form. The tables provide a quick 
reference to the entire instruction set because they show not only the instructions themselves, but also the 
registers, addressing modes, cycle counts, and program words required for each instruction. From these 
tables, it is very easy to determine if a particular operation can be performed with a desired register or 
addressing mode. 


The summary, found in Section 6.6.3, “Instruction Summary Tables,” is based on logical groupings of 
instructions, listing the instructions alphabetically within each grouping. This summary also contains the 
number of program words required by the instruction as well as the number of cycles required for 
execution. 


This section contains the following information: 
¢ Usage of the instruction summary tables 
e« Addressing mode notation 
¢ Register field notation 


¢ The instruction summary tables 


6.6.1 Register Field Notation 


There are many different register fields used within the instruction summary tables. These will be grouped 
into sets that are more easily understood. 


Table 6-14 shows the register set available for the most important move instructions. Sometimes the 
register field is broken into two different fields—one where the register is used as a source (src), and the 
other where it is used as a destination (dst). This is important because a different notation is used when an 
accumulator is being stored without saturation. Also see the register fields in Table 6-15, which are also 
used in move instructions as sources and destinations within the AGU. 
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In some cases, the notation used when specifying an accumulator determines whether or not saturation is 
enabled when the accumulator is being used as a source in a move or parallel move instruction. Refer to 
Section 3.4.1, “Data Limiter,” on page 3-26 and Section 3.2, “Accessing the Accumulator Registers,” on 
page 3-7 for information. 


Table 6-14. Register Fields for General-Purpose Writes and Reads 


Register Field 


Registers in This Field 


Comments 


HHH 


A, B, A1, B1 
XO, YO, Y1 


Seven data ALU registers—two accumulators, two 16-bit MSP 
portions of the accumulators, and three 16-bit data registers 


HHHH 


A, B, A1, B1 
XO, YO, Y1 
RO-R3, N 


Seven data ALU and five AGU registers 


DDDDD 


A, A2, A1, AO 
B, B2, B1, BO 


Y1, YO, XO 


RO, R1, R2, R3 
N, SP 
Mo1 


OMR, SR 
LA, LC 
HWS 


All CPU registers 


Table 6-15 shows the register set available for use as pointers in address-register-indirect addressing 
modes. This table also shows the notation used for AGU registers in AGU arithmetic operations. 


Table 6-15. Address Generation Unit (AGU) Registers 


Register Field negisters mons Comments 
Field 
Rn RO-R3 Five AGU registers available as pointers for addressing and as 
SP sources and destinations for move instructions 
Rj RO, R1, R2, R3 Four pointer registers available as pointers for addressing 
N N One index register available only for indexed addressing modes 
Mo1 Mo1 One modifier register 


Table 6-16 shows the register set available for use in data ALU arithmetic operations. The most common 


field used in this table is FDD. 
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Table 6-16. Data ALU Registers 


Register Field | Registers in This Field Comments 
FDD A,B Five data ALU registers—two 36-bit accumulators and three 16-bit 
XO, YO, Y1 data registers accessible during data ALU operations 


Contains the contents of the F and DD register fields 


F1iDD Ai, B1 Five data ALU registers—two 16-bit MSP portions of the 
X0, YO, Y1 accumulators and three 16-bit data registers accessible during data 
ALU operations 


DD X0, YO, Y1 Three 16-bit data registers 


FE A,B Two 36-bit accumulators accessible during parallel move instruc- 
tions and some data ALU operations 


F1 Ai, B1 The 16-bit MSP portion of two accumulators accessible as source 
operands in parallel move instructions 


6.6.2 Using the Instruction Summary Tables 


This section contains helpful information on using the summary tables. It contains some notation used 
within the tables. 


The register field notation is found in Section 6.6.1, “Register Field Notation.” 


Some additional notation to be considered is found in the instruction summary tables when allowed 
registers for multiplications are specified (Table 6-22 on page 6-20). In these tables, the following entry is 
found: 


(+)Y0,X0,FDD 


The notation (+) in this entry indicates that an optional + or - sign can be specified before the input register 
combination. If a - is specified, the multiplication result is inverted. This allows each of the following 
examples to be valid DSP56800 instructions: 

MAC X0,Y0,A; A + XO*YO -> A 


MAC +X0,Y0,A; A + XO*YO -> A 
MAC -x0,Y0,A; A -— (XO*YO) —> A 


As an example, Table 6-35 on page 6-30 shows all registers and addressing modes that are allowed when 
performing a dual read instruction, one of the DSP56800’s parallel move instructions. The instructions 
shown in Example 6-3 are allowed. 


Example 6-3. Valid Instructions 


MOVE X: (RO) +, YO X: (R3) +, X0 
MACR X0,Y1,A X:(R1)+N,Y1 X:(R3)-,X0 
ADD Y0O,B X:(R1)+N,YO XX: (R3)+,X0 


The instruction in Example 6-4 is not allowed: 


Example 6-4. Invalid Instruction 
ADD X0,Y1,A X:(R2)-,XO  X:(R3) +N, YO 
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Consulting the information in Table 6-35 on page 6-30 shows that this instruction is not valid for each of 
the following reasons: 


¢ The only operands accepted for ADD or SUB are X0,F, Y1,F, YO,F, A,B, or B,A, where F is either 
the A or B accumulator register. Thus, XO, Y1, A is an invalid entry. 


¢ The pointer R2 is not allowed for the first memory read. 
¢ The post-decrement addressing mode is not available for the first memory read. 


¢ The XO register may not be a destination for the first memory read because it is not listed in the 
Destination 1 column. 


¢ The post-update by N addressing mode is not allowed for the second memory read. The second 
memory read is always identified as the memory move that uses R3 in instructions with two 
memory moves. For the second memory read, only the post-increment and post-decrement 
addressing modes are allowed. 


¢ The YO register may not be a destination for the second memory read because it is not listed in the 
Destination 2 column. 


6.6.3 Instruction Summary Tables 


A summary of the entire DSP56800 instruction set is presented in this section in tabular form. In these 
tables, Table 6-17 on page 6-18 through Table 6-35 on page 6-30, the instructions are broken into several 
different categories and then listed alphabetically. 


The tables specify the operation, operands, and any relevant comments. There are separate fields for 
sources and destinations of move instructions. There are also two additional fields: 


¢ C—Time required to execute the instruction 
¢ W—Number of program words occupied by the instruction 


Instruction execution times are measured in oscillator clock cycles. This should not be confused with 
instruction cycles, which comprise the timing granularity of the DSP56800 execution units. Each 
instruction cycle is equivalent to two oscillator clock cycles. The numbers given for instruction times 
assume that internal memory—or external memory that requires no wait states—is used. 


All parallel move instructions are located in the last two tables in this section: 
¢ Table 6-34 on page 6-29 
¢ Table 6-35 on page 6-30 
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Table 6-17. Move Word Instructions 


Operation Source Destination Comments 
MOVE X:(Rn) DDDDD Move signed 16-bit integer word from 
or X:(Rn)+ memory 
MOVEC X:(Rn)- 
X:(Rn+N) DDDDD Address = Rn +N 
X:(Rn)+N DDDDD Post-update of Rn register 
X:(R2+xx) HHHH xx: Offset ranging from 0 to 63 
X:(RN+Xxxx) DDDDD Signed 16-bit offset 
X:(SP-xx) HHHH Unsigned 6-bit offset 
X !XXXX DDDDD Unsigned 16-bit address 
MOVE X:pp HHHH X:pp represents a 6-bit absolute I/O 
or or address. Refer to I/O Short Address 
MOVEP X:<<pp (Direct Addressing): <pp> on page 4-23 
MOVE X:aa HHHH X:aa represents a 6-bit absolute address. 
or or Refer to Absolute Short Address (Direct 
MOVES X:<aa Addressing): <aa> on page 4-22 
MOVE DDDDD X:(Rn) Move signed 16-bit integer word to memory 
or X:(Rn)+ 
MOVEC X:(Rn)- 
DDDDD X:(Rn+N) Address = Rn +N 
DDDDD X:(Rn)+N Post-update of Rn register 
HHHH X:(R2+xx) xx: offset ranging from 0 to 63 
DDDDD X:(RN+Xxxx) Signed 16-bit offset 
HHHH X:(SP-xx) Unsigned 6-bit offset 
DDDDD X:XXXX Unsigned 16-bit address 
MOVE HHHH X:pp X:pp represents a 6-bit absolute I/O 
or or address. Refer to I/O Short Address 
MOVEP X:<<pp (Direct Addressing): <pp> on page 4-23 
MOVE HHHH X:aa X:aa represents a 6-bit absolute address. 
or or Refer to Absolute Short Address (Direct 
MOVES X:<aa Addressing): <aa> on page 4-22 
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Table 6-18. Immediate Move Instructions 
Operation Source Destination W Comments 
MOVE #XX HHHH 1 | Signed 7-bit integer data (data is put in the lowest 7 
or bits of the word portion of any accumulator, upper 8 
MOVEI bits and extension reg are sign extended, LSP por- 
tion is set to “O”) 
#XXXX DDDDD 2 | Signed 16-bit immediate data. When LC is the desti- 
nation, use 13-bit values only. 
X:(R2+Xx) 2 
X:(SP-xx) 2 
X:XXXX 3 
MOVE H#XXXX X:pp 2 | Move 16-bit immediate data to the last 64 locations 
or or of X data memory-peripheral registers. 
MOVEP X:<<pp X:pp represents a 6-bit absolute I/O address. 
MOVE #XXXX X:aa 2 | Move 16-bit immediate data to the first 64 locations 
or or of X data memory. 
MOVES X:<aa X:aa represents a 6-bit absolute address. 
Table 6-19. Register-to-Register Move Instructions 
Operation Source Destination Cc |; W Comments 
MOVE DDDDD DDDDD 2 1 Move signed word to register 
or 
MOVEC 
Table 6-20. Move Word Instructions—Program Memory 
Operation Source Destination Cc; Ww Comments 
MOVE P:(Rj)+ HHHH 8 | 1 | Read signed word from program memory 
or P:(Rj)+N 
MOVEM 
HHHH P:(Rj)+ 8 | 1 | Write word to program memory 
P:(Rj)+N 
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Table 6-21. Conditional Register Transfer Instructions 


Data ALU Transfer AGU Transfer 
Operation Cc | W Comments 
Source | Destination |} Source | Destination 
Tcc DD F (No transfer) 2 1 Conditionally transfer one 
register 
A B (No transfer) 2 1 
B A (No transfer) 2 1 
DD F R1 2 1 Conditionally transfer one 
data ALU register and one 
AGU register 
A B R1 2 1 
B A R1 2 1 
Note: The Tcc instruction does not allow the following condition codes: HI, LS, NN, and NR. 


Table 6-22. Data ALU Multiply Instructions 


Operation Operands Cc | W Comments 
IMPY (16) Y1,X0,FDD 2 1 Integer 16x16 multiply with 16-bit result 
Y0,X0,FDD 
Y1,Y0,FDD When the destination is an accumulator F, the 
Y0,Y0,FDD FO portion is unchanged by the instruction 
A1,Y0,FDD 
B1,Y1,FDD Note: Assembler also accepts first two oper- 
ands when they are specified in opposite order 
MAC (+)Y1,X0,FDD 2 1 Fractional multiply accumulate; multiplication 
(+)Y0,X0,FDD result optionally negated before accumulation 
(+)Y¥1,Y0,FDD 
(+)Y0,Y0,FDD 
(+)A1,Y0,FDD Note: Assembler also accepts first two oper- 
(+)B1,Y1,FDD ands when they are specified in opposite orde 
MACR (+)¥1,X0,FDD 2 1 Fractional MAC with round, multiplication result 
(+)Y0,X0,FDD optionally negated before addition 
(+)Y1,Y0,FDD 
(+)Y0,Y0,FDD 
(+)A1,Y0,FDD Note: Assembler also accepts first two oper- 
(+)B1,Y1,FDD ands when they are specified in opposite orde 
MPY (+)Y1,X0,FDD 2 1 Fractional multiply where one operand is 
(+)Y0,X0,FDD optionally negated before multiplication 
(+)Y1,Y0,FDD 
(+)Y0,Y0,FDD 
(+)A1,Y0,FDD Note: Assembler also accepts first two oper- 
(+)B1,Y1,FDD ands when they are specified in opposite order 
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Table 6-22. Data ALU Multiply Instructions (Continued) 


Operation Operands Cc |W Comments 
MPYR (+)Y1,X0,FDD 2 1 Fractional multiply where one operand is 
(+)Y0,X0,FDD optionally negated before multiplication. Result 
(+)¥1,Y0,FDD is rounded 
(+)Y0,Y0,FDD 
(+)A1,Y0,FDD Note: Assembler also accepts first two oper- 
(+)B1,Y1,FDD ands when they are specified in opposite order 
Table 6-23. Data ALU Extended Precision Multiplication Instructions 
Operation Operands Cc | WwW Comments 
MACSU X0,Y1,FDD 2 1 | Signed or unsigned 16x16 fractional MAC with 
X0,Y0,FDD 32-bit result. 
Y0,Y1,FDD 
Y0,Y0,FDD The first operand is treated as signed and the 
Y0,A1,FDD second as unsigned. 
Y1,B1,FDD 
MPYSU X0,Y1,FDD 2 1 | Signed or unsigned 16x16 fractional multiply 
X0,Y0,FDD with 32-bit result. 
Y0,Y1,FDD 
Y0,Y0,FDD The first operand is treated as signed and the 
Y0,A1,FDD second as unsigned. 
Y1,B1,FDD 
Table 6-24. Data ALU Arithmetic Instructions 
Operation Operands Cc W Comments 
ABS E 2 1 | Absolute value. 
ADC Y,F 2 1 | Add with carry (sets C bit also). 
ADD DD,FDD 2 1 | 36-bit addition of two registers. 
F1,DD 
~F,F 
Y,F 
X:(SP-xx),-DD 6 1 |Add memory word to register. 
X:aa,FDD 4 d 
X:aa represents a 6-bit absolute address. Refer to 
X:xxxx,FDD 6 | 2 | Absolute Short Address (Direct Addressing): <aa> 
on page 4-22 
FDD,X:(SP-xx) 8 2 | Add register to memory word, storing the result back to 
FDD,X:xxxx eer momery 
FDD,X:aa 6 2 
#xx,FDD 4 1 |Add an immediate integer 0-31. 
#xxxx,FDD 6 2 |Add asigned 16-bit immediate. 
CLR F 2 1 | Clear 36-bit accumulator and set condition codes. 
F1iDD 2 1 | Identical to move #0,<reg>; does not set condition 
Rj codes. 
N 
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Table 6-24. Data ALU Arithmetic Instructions (Continued) 


Operation Operands Cc W Comments 
CMP DD,FDD 2 1 | 36-bit compare of two accumulators or data registers. 
F1,DD 
~F,F 
X:(SP-xx),-FDD 6 1 | Compare memory word with 36-bit accumulator. 
X:aa,FDD ‘ 
X:aa represents a 6-bit absolute address. Refer to 
X2xxxx,FDD 6 2 | Absolute Short Address (Direct Addressing): <aa> 
on page 4-22 
Note: Condition codes set based on 36-bit result 
#xx,FDD 4 1 |Compare accumulator with an immediate integer 0-31. 
#Xxxx,FDD 6 2 |Compare accumulator with a signed 16-bit immediate. 
DEC(W) FDD 2 1 | Decrement word. 
X:(SP-xx) 8 1 | Decrement word in memory using appropriate 
Xcaa 6 1 | addressing mode. 
X:XXXX 8 2 
DIV DD,F 2 1 | Divide iteration. 
INC(W) FDD 2 1 | Increment word. 
X:(SP-xx) 8 1 | Increment word in memory using appropriate address- 
Maa 6 1 ing mode. 
X:XXXX 8 2 
NEG F 2 1 | Two’s-complement negation. 
RND F 2 1 | Round. 
SBC Y,F 2 1 | Subtract with carry (set C bit also). 
SUB DD,FDD 2 1 | 36-bit subtract of two registers. 16-bit source registers 
F1.DD are first sign extended internally and concatenated 
= : with 16 zero bits to form a 36-bit operand. 
Y,F 
X:(SP-xx),-FDD 6 1 | Subtract memory word from register. 
X:aa.FDD 4 1 |X:aa represents a 6-bit absolute address. Refer to 
= Absolute Short Address (Direct Addressing): <aa> 
X:xxxx,FDD 6 2 | on page 4-22 
#xx,FDD 4 1 | Subtract an immediate value 0-31. 
#xxxx,FDD 6 2 | Subtract a signed 16-bit immediate. 
TFR DD,F 2 1 | Transfer register to register. 
A,B 2 1 | Transfer one accumulator to another (36-bits). 
B,A 2 1 | Transfer one accumulator to another (36-bits). 
TST F 2 1 | Test 36-bit accumulator. 
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Table 6-24. Data ALU Arithmetic Instructions (Continued) 


Operation Operands Cc WwW Comments 
TSTW DDDDD 2 1 | Test 16-bit word in register. All registers allowed 
(except HWS) except HWS. Limiting is not performed if an accumula- 
tor is specified. 
X:(Rn) 2 1 | Testa word in memory using appropriate addressing 
X:(Rn)+ Ta ees 
X:(Rn)- 2 1 | X:aa represents a 6-bit absolute address. Refer to 
X:(Rn+N) 4 1 | Absolute Short Address (Direct Addressing): <aa> 
X(Rn)+N 2 | 1 |onpage 4-22 
X:(RN+XXxXx) 6 2 
X:(R2+xx) 4 1 
X:(SP-xx) 4 1 
X:aa 2 1 
X:pp 2 1 
X!XXXX 4 2 
Table 6-25. Data ALU Miscellaneous Instructions 
Operation Operands Cc W Comments 
NORM RO,F 2 1 Normalization iteration instruction for normalizing 
the F accumulator 
Table 6-26. Data ALU Logical Instructions 
Operation Operands Cc W Commenis 
AND DD,FDD 2 1 16-bit logical AND 
F1,DD 
EOR DD,FDD 2 1 16-bit exclusive OR (XOR) 
F1,DD 
NOT FDD 2 1 One’s-complement (bit-wise negation) 
OR DD,FDD 2 1 16-bit logical OR 
F1,DD 


The ANDC, EORC, ORC, and NOTC can also be used to perform logical operations on registers and data 
memory locations. ANDC, EORC, and ORC allow logical operations with 16-bit immediate data. See 
Section 6.5.1, “ANDC, EORC, ORC, and NOTC Aliases,” for additional information. 
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Table 6-27. Data ALU Shifting Instructions 


Operation Operands Cc |W Comments 
ASL FDD 2 1 | Arithmetic shift left entire register by 1 bit 
ASLL Y1,X0,FDD 2 1 | Arithmetic shift left of the first operand by value 
Y0,X0,FDD specified in four LSBs of the second operand; 
Y1,Y0,FDD places result in FDD 
Y0,Y0,FDD 
A1,Y0,FDD 
B1,Y1,FDD 
ASR FDD 1 | Arithmetic shift right entire register by 1 bit 
ASRR Y1,X0,FDD 1 | Arithmetic shift right of the first operand by 
Y0,X0,FDD value specified in four LSBs of the second 
Y1,Y0,FDD operand; places result in FDD 
Y0,Y0,FDD 
A1,Y0,FDD 
B1,Y1,FDD 
ASRAC Y1,X0,F 2 1 | Arithmetic word shifting with accumulation 
YO,X0,F 
Y1,Y0,F 
YO,Y0,F 
A1,Y0,F 
B1,Y1,F 
LSL FDD 1 | 1-bit logical shift left of word 
LSR FDD 1 | 1-bit logical shift right of word 
LSRR Y1,X0,FDD 1 | Logical shift right of the first operand by value 
Y0,X0,FDD specified in four LSBs of the second operand; 
Y1,Y0,FDD places result in FDD (when result is to an accu- 
Y0,Y0,FDD mulator F, zero extends into F2) 
A1,Y0,FDD 
B1,Y1,FDD 
LSRAC Y1,X0,F 2 1 | Logical word shifting with accumulation 
Y0,X0,F 
Y1,Y0,F 
Y0,Y0,F 
A1,Y0,F 
B1,Y1,F 
ROL FDD 2 1 | Rotate 16-bit register left by 1 bit through the 
carry bit 
ROR FDD 2 1 | Rotate 16-bit register right by 1 bit through the 


carry bit 
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Table 6-28. AGU Arithmetic Instructions 
Operation Operands Cc | W Comments 
LEA (Rn)+ 2 1 Increment the Rn pointer register 
(Rn)- 2 1 Decrement the Rn pointer register 
(Rn)+N 2 1 Add N index register to the Rn register and store the 
result in the Rn register 
(R2+xx) 2 1 Add a 6-bit unsigned immediate value to R2 and store 
in the R2 pointer 
(SP-xx) 2 1 Subtract a 6-bit unsigned immediate value from SP and 
store in the SP register 
(RN+xxxx) 4 2 | Adda 16-bit signed immediate value to the specified 
source register 
TSTW (Rn)- 2 1 Test and decrement AGU register. Refer to Table 6-24 
for other forms of TSTW that are executed in the Data 
ALU. 
Table 6-29. Bit-Manipulation Instructions 
Operation Operands Cc W Comments 
BFTSTH #xxxx, DDDDD 4 2 BFTSTH tests all bits selected by the 16-bit 
immediate mask. If all selected bits are set, 
#Xxxx,X:(R2+Xx) 6 2 | then the C bit is set. Otherwise it is cleared. 
#XXXX,X:(SP-xx) 6 2 | Allregisters in DDDDD are permitted except 
HWS. 
#XXXX, Xa Z 2 | X:aa represents a 6-bit absolute address. 
: Refer to Absolute Short Address (Direct 
HXXXX,X:Pp ‘ 2 Addressing): <aa> on page 4-22 
Panera eee 6 3 X:pp represents a 6-bit absolute I/O address. 
BFTSTL #xxxx, DDDDD 4 2 BFTSTL tests all bits selected by the 16-bit 
immediate mask. If all selected bits are clear, 
#Xxxx,X:(R2+Xx) 6 2 | then the C bit is set. Otherwise it is cleared. 
#XXxXXx,X:(SP-xx) 6 2 | Allregisters in DDDDD are permitted except 
HWS. 
HXXXX,Xaa 4 2 | X:aa represents a 6-bit absolute address. 
; Refer to Absolute Short Address (Direct 
HAXXX,X-PP 4 2 Addressing): <aa> on page 4-22 
‘ise Mace 6 3 X:pp represents a 6-bit absolute I/O address. 
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Table 6-29. Bit-Manipulation Instructions (Continued) 


Operation Operands Cc W Comments 
BFCHG #xxxx, DDDDD 4 2 BFCHG tests all bits selected by the 16-bit 
immediate mask. If all selected bits are set, 
#XXxxx,X:(R2+Xx) 6 2 | then the C bit is set. Otherwise it is cleared. 
Then it inverts all selected bits. 
#XXXX,X:(SP-xx) 6 2 
All registers in DDDDD are permitted except 
#XXXX,X!aa 4 2 HWS. 
; X:aa represents a 6-bit absolute address. 
FXXXX,X'PP 4 2 | Refer to Absolute Short Address (Direct 
; Addressing): <aa> on page 4-22 
PORES IO ° e X:pp represents a 6-bit absolute I/O address. 
BFCLR #xxxx, DDDDD 4 2 BFCLR tests all bits selected by the 16-bit 
immediate mask. If all selected bits are set, 
#XXXX,X:(R2+XX) 6 2 | then the C bit is set. Otherwise it is cleared. 
Then it clears all selected bits. 
#XXXX,X:(SP-xx) 6 2 
All registers in DDDDD are permitted except 
#XXXX,X!aa 4 2 HWS. 
; X:aa represents a 6-bit absolute address. 
FXXXX,X:PP 5 2 | Refer to Absolute Short Address (Direct 
; Addressing): <aa> on page 4-22 
POI : X:pp represents a 6-bit absolute I/O address. 
BFSET #xxxx, DDDDD 4 2 BFSET tests all bits selected by the 16-bit 
immediate mask. If all selected bits are clear, 
#XXXX,X:(R2+Xx) 6 2 | then the C bit is set. Otherwise it is cleared. 
Then it sets all selected bits. 
#XXXX,X:(SP-xx) 6 2 
All registers in DDDDD are permitted except 
#XXXX,X!aa 4 2 HWS. 
; X:aa represents a 6-bit absolute address. 
FXXXX, X:PP 4 2 | Refer to Absolute Short Address (Direct 
; Addressing): <aa> on page 4-22 
Pe ° ° X:pp represents a 6-bit absolute I/O address. 
Table 6-30. Branch on Bit-Manipulation Instructions 
Operation Operands c! W Comments 
BRCLR #MASK8,DDDDD,AA 10/8 | 2 |BRCLR tests all bits selected by the immediate 
#MASK8,X:(R2+xx),AA 12/10! 2 mask. If all selected bits are clear, then the carry 
ae) bit is set and a PC relative branch occurs. Other- 
#MASK8,X:(SP-xx),AA 12/10 | 2 | wise it is cleared and no branch occurs. 
#MASK8,X:aa,AA 10/8 | 2 
#MASK8,X:pp,AA 10/8 3 ey in DDDDD are permitted except 
#MASK8,X:xxxx,AA 12/10| 3 
MASK8 specifies a 16-bit immediate value where 
either the upper or lower 8 bits contains all zeros. 
AA specifies a 7-bit PC relative offset. 
X:aa represents a 6-bit absolute address. 
X:pp represents a 6-bit absolute I/O address. 
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Table 6-30. Branch on Bit-Manipulation Instructions (Continued) 


Operation Operands c! Ww Comments 
BRSET #MASK8,DDDDD,AA 10/8 | 2 |BRSET tests all bits selected by the immediate 
#MASK8,X:(R2+xx), AA 12/10! 2 mask, If all selected bits are set, then the carry bit 
is set and a PC relative branch occurs. Otherwise 

#MASK8, X:(SP-xx) AA 12/10) 2 | itis cleared and no branch occurs. 
#MASK8,X:aa,AA 10/8 | 2 
#MASK8,X:pp,AA 10/8 | 2 ve in DDDDD are permitted except 

#MASK8,X:xxxx,AA 12/10| 3 


MASK§8 specifies a 16-bit immediate value where 
either the upper or lower 8 bits contains all zeros. 


AA specifies a 7-bit PC relative offset. 
X:aa represents a 6-bit absolute address. 
X:pp represents a 6-bit absolute I/O address. 


1. First cycle count is if branch is taken (condition is true); second is if branch is not taken. 


Table 6-31. Change of Flow Instructions 


Operation Operands c' |W Comments 
Bcc XX 6/4 | 1 | 7-bit signed PC relative offset. (xx <=> <OFFSET7>) 
BRA XX 6 1 | 7-bit signed PC relative offset. (xx <=> <OFFSET7>) 
Jcc XXXX 6/4 | 2 | 16-bit absolute address 
JMP XXXX 6 2 | 16-bit absolute address 
JSR XXXX 8 2 | Push 16-bit return address and jump to 16-bit target address 
RTI 10 1 | Return from interrupt, restoring 16-bit PC and SR from the 
stack 
RTS 10 1 Return from subroutine, restoring 16-bit PC from the stack 


1. First cycle count is if branch is taken (condition is true); second is if branch is not taken. 


Table 6-32. Looping Instructions 


Operation Operands Cc | W Comments 
DO #XX,XXXX 6 2 | Load LC register with unsigned value and start hardware 
DO loop with 6-bit immediate loop count. The last address 
is 16-bit absolute. #xx = 0 not allowed by assembler. 
DDDDD,xxxx 6 2 | Load LC register with unsigned value. If LC is not equal to 


zero, start hardware DO loop with 16-bit loop count in regis- 
ter. Otherwise, skip body of loop (adds three additional 
cycles). The last address is 16-bit absolute. 


Any register allowed except: SP, M01, SR, OMR, and HWS. 
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Table 6-32. Looping Instructions (Continued) 


Operation Operands Cc | W Comments 
ENDDO 2 1 Remove one value from the hardware stack and update the 
NL and LF bits appropriately. 
Note: Does not branch to the end of the loop. 
REP #XX 6 1 Hardware repeat of a one-word instruction with immediate 
loop count. 
DDDDD 6 1 Hardware repeat of a one-word instruction with loop count 
specified in register. 
Any register allowed except: SP, M01, SR, OMR, and HWS. 
Table 6-33. Control Instructions 
Operation Operands Cc W Comments 

DEBUG 4 1 Generate a debug event. 

ILLEGAL 4 1 Execute the illegal instruction exception. This instruction is made 
available so that code may be written to test and verify interrupt 
handlers for illegal instructions. 

NOP 2 1 | No operation. 
STOP n/a 1 | Enter STOP low-power mode. 
SWI 8 1 Execute the trap exception at the highest interrupt priority level, 
level 1 (non-maskable). 
WAIT n/a 1 Enter WAIT low-power mode. 
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Table 6-34. Data ALU Instructions—Single Parallel Move 


Data ALU Operation Parallel Memory Move 
Operation Operands Source Destination 
MAC Y1,X0,F X:(Rj)+ X0 
MPY YO0,X0,F X:(Rij) +N Y1 
MACR Y1,Y0,F YO 
MPYR Y0,Y0,F 
A 
A1,Y0,F B 
B1,Y1,F Al 
B1 
ADD X0,F X0 X:(Rj)+ 
SUB Y1,F Y1 X:(Rj)+N 
CMP YO,F YO 
TFR A,B A 
B,A B 
Al 
ABS F B1 
ASL 
ASR 
CLR 
RND 
TST 
INC or INCW 
DEC or DECW 
NEG 


Each instruction in Table 6-34 requires one program word and executes in one cycle. The data type 
accessed by the single memory move in all single parallel move instructions is signed word. 


The solid double line running down the center of the table indicates that the data ALU operation is 
independent from the parallel memory move. As a result, any valid operation can be combined with any 
valid memory move. Example 6-5 lists examples of valid single parallel move instructions. 


Example 6-5. Examples of Single Parallel Moves 


MAC v1,X0,A X: (RO) +, XO 
MAC Y1,X0,A X0,X: (RO) + 
ASL B X: (RO) +, Y1 
ASL B Y1,X: (RO) + 


It is not permitted to perform MAC A,B X: (RO) +,X0 because the MAC instruction requires three 
operands, as shown in Table 6-34. The operands are not independent of the operation performed. This is 
why a single line is used to separate the operation from the operands instead of a double line. 
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For the MAC, MPY, MACR, and MPYR instructions, the assembler accepts the two source operands in 
any order. 


Table 6-35. Data ALU Instructions—Dual Parallel Read 


Data ALU First Memory Second Memory 
Operation Read Read 
Operation Operands Source 1 Destination 1 Source 2 Destination 2 
MAC Y1,X0,F X:(RO)+ YO X:(R3)+ XO 
MPY Y1,Y0,F X:(RO)+N Y1 X:(R8)- 
MACR YO,X0,F X:(R1)+ 
MPYR X:(R1)+N 
ADD X0,F 
SUB Y1,F 
YO,F 
MOVE 


Each instruction in Table 6-35 requires one program word and executes in one cycle. 


The data types accessed by the two memory moves in all dual parallel read instructions are signed words. 


6.7 The Instruction Pipeline 


Instruction execution is pipelined to allow most instructions to execute at a rate of one instruction every 
two clock cycles. However, certain instructions require additional time to execute, including instructions 
with the following properties: 


¢« Exceed length of one word 

e Use an addressing mode that requires more than one cycle 
e« Access the program memory 

¢ Cause a control flow change 


In the case of a control flow change, a cycle is needed to clear the pipeline. 


6.7.1 Instruction Processing 


Pipelining allows the fetch-decode-execute operations of an instruction to occur during the 
fetch-decode-execute operations of other instructions. While an instruction is executed, the next instruction 
to be executed is decoded, and the instruction to follow the instruction being decoded is fetched from 
program memory. If an instruction is two words in length, the additional word will be fetched before the 
next instruction is fetched. 


Figure 6-4 demonstrates pipelining; Fl, D1, and El refer to the fetch, decode, and execute operations, 
respectively, of the first instruction. Note that the third instruction contains an instruction extension word 
and takes two cycles to execute. 
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Fetch F1 F2 | F3 | F8e |] F4 | F5 | F6 
Decode D1 | D2 | D3 | D8e} D4 | D5 
Execute E1 E2 E3 | E8e | E4 
Instruction Cycle 1 2 3 4 5 6 7 


Figure 6-4. Pipelining 


Each instruction requires a minimum of three instruction cycles (six machine cycles) to be fetched, 
decoded, and executed. A new instruction may be started after two machine cycles, making the throughput 
rate to be one instruction executed every instruction cycle for single-cycle instructions. Two-word 
instructions require a minimum of eight machine cycles to execute, and a new instruction may start after 
four machine cycles. 


6.7.2 Memory Access Processing 


One or more of the DSP memory sources (X data memory and program memory) may be accessed during 
the execution of an instruction. Three address buses (KAB1, XAB2, and PAB) and three data buses 
(CGDB, XDB2, and PDB) are available for internal memory accesses during one instruction cycle, but 
only one address bus and one data bus are available for external memory accesses (when the external bus is 
available). If all memory sources are internal to the DSP, one or more of the two memory sources may be 
accessed in one instruction cycle (that is, program memory access, or program memory access plus an X 
memory reference, or program memory access with two X memory references). 


NOTE: 


For instructions that contain two X memory references, the second transfer 
using XAB2 and XDB2 may not access external memory. All accesses 
across these buses must access internal memory only. 


See Section 7.2.2, “Instruction Pipeline with Off-Chip Memory Accesses,” on page 7-3 for a discussion of 
off-chip memory accesses. 
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Chapter 7 
Interrupts and the Processing States 


The DSP56800 Family processors have six processing states and are always in one of these states (see 
Table 7-1). Each processing state is described in detail in the following sections except the debug 
processing state, which is discussed in Section 9.3, “OnCE Port,” on page 9-4. In addition, special cases of 
interrupt pipelines are discussed at the end of the section. Section 8.10, “Interrupts,” on page 8-30 
discusses software techniques for interrupt processing. 


Table 7-1. Processing States 


State Description 


Reset The state where the DSP core is forced into a known reset state. Typically, the first 
program instruction is fetched upon exiting this state. 


Normal The state of the DSP core where instructions are normally executed. 


Exception | The state of interrupt processing, where the DSP core transfers program control from its 
current location to an interrupt service routine using the interrupt vector table. 


Wait A low-power state where the DSP core is shut down but the peripherals and interrupt 
machine remain active. 


Stop A low-power state where the DSP core, the interrupt machine, and most (if not all) of the 
peripherals are shut down. 


Debug The state where the DSP core is halted and all registers in the On-Chip Emulation 
(OnCE) port of the processor are accessible for program debug. 


7.1 Reset Processing State 


The processor enters the reset processing state when the external RESET pin is asserted and a hardware 
reset occurs. On devices with a computer operating properly (COP) timer, it is also possible to enter the 
reset processing state when this timer reaches zero. The DSP is typically held in reset during the power-up 
process through assertion of the RESET pin, making this the first processing state entered by the DSP. The 
reset state performs the following: 


1. Resets internal peripheral devices 

2. Sets the MO1 modifier register to $FFFF 
3. Clears the interrupt priority register IPR) 
4 


Sets the wait state fields in the bus control register (BCR) to their maximum value, thereby 
inserting the maximum number of wait states for all external memory accesses 
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5. Clears the status register’s (SR) loop flag and condition code bits and sets the interrupt 
mask bits 


6. Clears the following bits in the operating mode register: nested looping, condition codes, 
stop delay, rounding, and external X memory 


The DSP remains in the reset state until the RESET pin is deasserted. When hardware deasserts the RESET 
pin, the following occur: 


1. The chip operating mode bits in the OMR are loaded from an external source, typically mode 
select pins; see the appropriate device manual for details. 


2. A delay of 16 instruction cycles (NOPs) occurs to sync the local clock generator and state 
machine. 


3. The chip begins program execution at the program memory address defined by the state of 
the MA and MB bits in the OMR and the type of reset (hardware or COP time-out). The 
first instruction must be fetched and then decoded before execution. Therefore, the first 
instruction execution is two instruction cycles after the first instruction fetch. 


After this last step, the DSP enters the normal processing state upon exiting reset. It is also possible for the 
DSP to enter the debug processing state upon exiting reset when system debug is underway. 


7.2 Normal Processing State 


The normal processing state is the typical state of the processor where it executes instructions in a 
three-stage pipeline. This includes the execution of simple instructions such as moves or ALU operations 
as well as jumps, hardware looping, bit-field instructions, instructions with parallel moves, and so on. 
Details about the execution of the individual instructions can be found in Appendix A, “Instruction Set 
Details.” The chip must be reset before it can enter the normal processing state. 


7.2.1 Instruction Pipeline Description 


The instruction-execution pipeline is a three-stage pipeline, which allows most instructions to execute at a 
rate of one instruction per instruction cycle. For the case where there are no off-chip memory accesses, or 
for the case of a single off-chip access with no wait states, one instruction cycle is equivalent to two 
machine cycles. A machine cycle is defined as one cycle of the clock provided to the DSP core. Certain 
instructions, however, require more than one instruction cycle to execute. These instructions include the 
following: 


e Instructions longer than one word 
¢ Instructions using an addressing mode that requires more than one cycle 
¢ Instructions that cause a control-flow change 


Pipelining allows instruction executions to overlap so that the fetch-decode-execute operations of a given 
instruction occur concurrently with the fetch-decode-execute operations of other instructions. Specifically, 
while the processor is executing one instruction, it is decoding the next instruction and fetching a third 
instruction from program memory. The processor fetches only one instruction word per instruction cycle; 
if an instruction is two words in length, it fetches the additional word with an additional cycle before it 
fetches the next instruction. 
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Table 7-2. Instruction Pipelining 


Instruction Cycle 
Operation 
1 2 3 4 5 6 7 ° ° . 
Fetch F1 F2 F3 | F3e | F4 F5 F6 ° . . 
Decode D1 D2 D3 | D8e | D4 D5 ° . . 
Execute E1 E2 E3 E3e E4 . . . 


Table 7-2 demonstrates pipelining. “F1,” “D1,” and “E1” refer to the fetch, decode, and execute operations 
of the first instruction, respectively. The third instruction, which contains an instruction extension word, 
takes two instruction cycles to execute. Although it takes three instruction cycles (six machine cycles) for 
the pipeline to fill and the first instruction to execute, an instruction usually executes on each instruction 
cycle thereafter (two machine cycles). 


7.2.2 Instruction Pipeline with Off-Chip Memory Accesses 


The three sets of internal on-chip address and data buses (XKAB1/CGDB, XAB2/XDB2, PAB/PDB) allow 
for fast memory access when memories are being accessed on-chip. The DSP can perform memory 
accesses on all three bus pairs in a single instruction cycle, permitting the fetch of an instruction 
concurrently with up to two accesses to the X data memory. Thus, for applications where all program and 
data is located in on-chip memory, there is no speed penalty when performing up to three memory accesses 
in a single instruction. 


Similarly, the external address and data bus also allows for fast program execution. For the case where 
only program memory is external to the chip or only X data memory is external (XAB1/CDGB bus pair), 
the DSP chip will still execute programs at full speed if there are no wait states programmed on the 
external bus by the user. For the case where an instruction requires an external program fetch and an 
external X data memory access simultaneously, the instruction will still operate correctly. The instruction 
is automatically stretched an additional instruction cycle so that the two external accesses may be 
performed correctly, and wait states are inserted accordingly. All this occurs transparently to the user to 
allow for easier program development. 


This information is summarized in Table 7-3, which shows how the chip automatically inserts instruction 
cycles and wait states for an instruction that is simultaneously accessing program and data memory. For 
dual parallel read instructions, the second X memory access that uses XAB2/XDB2 must always be done 
to on-chip memory. This second access may never access external off-chip memory. 
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Table 7-3. Additional Cycles for Off-Chip Memory Accesses 


Memory Space 
numbet ct Comments 
Program X Memory X Memory Additional Cycles 
Fetch First Access Second Access 

On-chip On-chip On-chip 0 All accesses internal 

External On-chip On-chip 0+mvm One external access 

On-chip External On-chip 0+mv One external access 

External External On-chip 1+mv+mvm Two external accesses 
Note: The ‘mv’ and ‘mvm’ cycle time values reflect the additional time required for all MOVE instructions and for 
MOVENM instructions, respectively. 


7.2.3 Instruction Pipeline Dependencies and Interlocks 


The pipeline is normally transparent to the user. However, there are certain instruction-sequence 
combinations where the pipeline will affect the program execution. Such situations are best described by 
case studies. Most of these restricted sequences occur because either all addresses are formed during 
instruction decode or they are the result of contention for an internal resource such as the SR. 


If the execution of an instruction depends on the relative location of the instruction in a sequence of 
instructions, there is a pipeline effect. 


It is possible to see if there is a pipeline dependency. To test for a suspected pipeline effect, compare the 
execution of the suspect instruction when it directly follows the previous instruction and when four NOPs 
are inserted between the two. If there is a difference, it is caused by a pipeline effect. The assembler flags 
instruction sequences with potential pipeline effects so that the user can determine if the operation will 
execute as expected. 


Example 7-1. Pipeline Dependencies in Similar Code Sequences 


No Pipeline Effect 


ORC #$0001,SR ; Changes carry bit at the end of execution time slot 
JCS LABEL ; Reads condition codes in SR in its 
; execution time slot 


The JCS instruction will test the carry bit modified by the ORC without any pipeline effect in this code segment. 
Pipeline Effect 


ORC #$0008, OMR ; Sets EX bit at execution time slot 
MOVE X:$17,A ; Reads internal memory instead of external 
; memory 


A pipeline effect occurs because the address of the MOVE is formed at its decode time before the ORC changes the 
EX bit (which changes the memory map) in the ORC’s execution time slot. The following code produces the expected 
results of reading the external ROM: 


ORC #$0008,OMR ; Sets EX bit at execution time slot 
NOP ; Delays the MOVE so it will read the updated memory map 
MOVE X:$17,A ; Reads external memory 
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Example 7-2. Common Pipeline Dependency Code Sequence 


MOVE X0,R2 ; Move a value into register R2 
MOVE X:(R2),A ; Uses the OLD contents of R2 to address memory. 
In this case, before the first MOVE instruction has written R2 during its execution cycle, the second MOVE has 
accessed the old R2, using the old contents of R2. This is because the address for indirect moves is formed during 
the decode cycle. This overlapping instruction execution in the pipeline causes the pipeline effect. 
After an address register has been written by a MOVE instruction, one instruction cycle should be allowed before the 
new contents are available for use as an address register by another MOVE instruction. The proper instruction 
sequence follows: 
MOVE X0,R2 
NOP 


Gl GI 


Moves a number into register R2 

Executes any instruction or instruction sequence not 
using the R2 register written in the previous 
instruction 

Uses the new contents of R2 


Ne Ne Ne Ne Ne 


MOVE X:(R2),A 


Section 4.4, “Pipeline Dependencies,” on page 4-33 contains more details on interlocks caused during 
address generation. 


7.3 Exception Processing State 


The exception processing state is the state where the DSP core recognizes and processes interrupts that can 
be generated by conditions inside the DSP or from external sources. Upon the occurrence of an event, 
interrupt processing transfers control from the currently executing program to an interrupt service routine, 
with the ability to later return to the current program upon completion of the interrupt service routine. In 
digital signal processing, some of the main uses of interrupts are to transfer data between DSP memory and 
a peripheral device or to begin execution of a DSP algorithm upon reception of a new sample. An interrupt 
can also be used to exit the DSP’s low-power wait processing state. 


An interrupt will cause the processor to enter the exception processing state. Upon entering this state, the 
current instruction in decode executes normally. The next fetch address is supplied by the interrupt 
controller and points into the interrupt vector table (Table 7-4 on page 7-7). During this fetch the PC is not 
updated. The instruction located at these two addresses in the interrupt vector table must always be a 
two-word, unconditional jump-to-subroutine instruction (JSR). Note that the interrupt controller only 
fetches the second word of the JSR instruction. This results in the program changing flow to an interrupt 
routine, and a context switch is performed. 


There are many sources for interrupts on the DSP56800 Family of chips, and some of these sources can 
generate more than one interrupt. Interrupt requests can be generated from conditions within the DSP core, 
from the DSP peripherals, or from external pins. The DSP core features a prioritized interrupt vector 
scheme with up to 64 vectors to provide faster interrupt servicing. The interrupt priority structure is 
discussed in Section 7.3.3, “Interrupt Priority Structure.” 


7.3.1 Sequence of Events in the Exception Processing State 


The following steps occur in exception processing: 


1. A request for an interrupt is generated either on a pin, from the DSP core, from a peripheral 
on the DSP chip, or from an instruction executed by the DSP core. Any hardware interrupt 
request from a pin is first synchronized with the DSP clock. 
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2. The request for an interrupt by a particular source is latched in an interrupt-pending flag if 
it is an edge or non-maskable interrupt (all other interrupts are not latched and must remain 
asserted in order to be serviced). For peripherals that can generate more than one interrupt 
request and have more than one interrupt vector, the interrupt arbiter only sees one request 
from the peripheral active at a time. 


3. All pending interrupt requests are arbitrated to select which interrupt will be processed. The 
arbiter automatically ignores any interrupts with an interrupt priority level (IPL) lower than 
the interrupt mask level specified in the SR. If there are any remaining requests, the arbiter 
selects the remaining interrupt with the highest IPL, and the chip enters the exception 
processing state (see Figure 7-1). 


4. The interrupt controller then freezes the program counter (PC) and fetches the JSR 
instruction located at the two interrupt vector addresses associated with the selected 
interrupt. It is required that the instruction located at the interrupt vector address must be a 
two-word JSR instruction. Note that only the second word of the JSR instruction is fetched; 
the first word of the JSR is provided by the interrupt controller. 


5. The interrupt controller places this JSR instruction into the instruction stream and then 
releases the PC, which is used for the next instruction fetch. Arbitration among the 
remaining interrupt requests is allowed to resume. The next interrupt arbitration then 
begins. 


6. The execution of the JSR instruction stacks the PC and the SR as it transfers control to the 
first instruction in the interrupt service routine. These two stacked registers contain the 
16-bit return address that will later be used to return to the interrupted code, as well as the 
condition code state. In addition, the IPL is raised to level 1 to disallow any level 0 
interrupts. Note that the OnCE trap, stack error, illegal instruction, and SWI can still 
generate interrupts because these are level 1 interrupts and are non-maskable. 


The exception processing state is completed when the processor executes the JSR instruction located in the 
interrupt vector table and the chip enters the normal processing state. As it enters the normal processing 
State, it begins executing the first instruction in the interrupt service routine. Each interrupt service routine 
should return to the main program by executing an RTI instruction. 


Interrupt routines for level 0 interrupts are interruptible by higher priority interrupts. Figure 7-1 shows an 
example of processing an interrupt. 


Interrupt Service Routine 


Main 
Program SSI Receive Data 
with Exception Status 
$0100 a Interrupt 
Recognized 


sors | —_| 


JSR Instruction 
in Vector Table to 
Interrupt Service 
Routine 


Explicit Return 
from Interrupt 
Recognized 


AA0056 


Figure 7-1. Interrupt Processing 
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Steps 1 through 3 listed on page page 7-5 require two additional instruction cycles, effectively making the 
interrupt pipeline five levels deep. 


7.3.2 Reset and Interrupt Vector Table 


The interrupt vector table specifies the addresses that the processor accesses once it recognizes an interrupt 
and begins exception processing. Since peripherals can also generate interrupts, the interrupt vector map 
for a given chip is specified by all sources on the DSP core as well as all peripherals that can generate an 
interrupt. Table 7-4 lists the reset and interrupt vectors available on DSP56800-based DSP chips. The 
interrupt vectors used by on-chip peripherals, or by additional device-specific interrupt will be listed in the 
user’s manual for that chip. 


Table 7-4. DSP56800 Core Reset and Interrupt Vector Table 


mterty pt Interrupt 

Starting Priority Level Interrupt Source 

Address 
$0000 - Hardware Reset 
$0002 - COP Watchdog Reset 
$0004 - (Reserved) 
$0006 1 Illegal Instruction Trap 
$0008 1 SWI 
$000A 1 Hardware Stack Overflow 
$000C 1 OnCE Trap 
$000E 1 (Reserved) 
$0010 0 IRQA 
$0012 0 IRQB 
$0014 0 (Vector Available for On-Chip Peripherals) 
$0016 0 (Vector Available for On-Chip Peripherals) 
$0018 0 (Vector Available for On-Chip Peripherals) 
$001A 0 (Vector Available for On-Chip Peripherals) 
$001C 0 (Vector Available for On-Chip Peripherals) 
$001E 0 (Vector Available for On-Chip Peripherals) 
$0020 0 (Vector Available for On-Chip Peripherals) 
$007C 0 (Vector Available for On-Chip Peripherals) 
$007E 0 (Vector Available for On-Chip Peripherals) 
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It is required that a two-word JSR instruction is present in any interrupt vector location that may be fetched 
during exception processing. If an interrupt vector location is unused, then the JSR instruction is not 
required. 


The hardware reset and COP reset are special cases because they are reset vectors, not interrupt vectors. 
There is no IPL specified for these two because these conditions reset the chip and reset takes precedence 
over any interrupt. Typically a two-word JMP instruction is used in the reset vectors. The hardware reset 
vector will either be at address $0000 or $E000 and the COP reset vector will either be at $0002 or $E002 
depending on the operating mode of the chip. The different operating modes are discussed in 

Section 5.1.9.1, “Operating Mode Bits (MB and MA)—Bits 1-0,” on page 5-10. 


7.3.3 Interrupt Priority Structure 


Interrupts are organized in a simple priority structure. Each interrupt source has an associated IPL: Level 0 
or Level 1. Level 0, the lowest level, is maskable, and Level 1 is non-maskable. Table 7-5 summarizes the 
priority levels and their associated interrupt sources. 


Table 7-5. Interrupt Priority Level Summary 


IPL Description Interrupt Sources 
0 Maskable On-chip peripherals, 
IRQA and IRQB 
1 Non-maskable Illegal instruction, OnCE trap, 
HWS overflow, SWI 


The interrupt mask bits (11, I0) in the SR reflect the current priority level and indicate the IPL needed for 
an interrupt source to interrupt the processor (see Table 7-6). Interrupts are inhibited for all priority levels 
below the current processor priority level. Level | interrupts, however, are not maskable and, therefore, 
can always interrupt the processor. 


Table 7-6. Interrupt Mask Bit Definition in the Status Register 


11 10 Exceptions Permitted Exceptions Masked 
0 0 (Reserved) (Reserved) 

0 1 IPL O, 1 None 

1 0 (Reserved) (Reserved) 

1 1 IPL 1 IPLO 


7.3.4 Configuring Interrupt Sources 


The interrupt unit in the DSP56800 core supports seven interrupt channels for use by on-chip peripherals, 
in addition to the IRQ interrupts and interrupts generated by the DSP core. Each maskable interrupt source 
can individually be enabled or disabled as required by the application. The exact method for doing so is 
dependant on the particular DSP56800-based device, as some of the interrupt handling logic is 
implemented as an on-chip peripheral. 


One example of how interrupts can be enabled and disabled, and their priority level established, is with an 
interrupt priority register (IPR). 
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cv fo foe fos fom Fors ove} = =D TREE TY] 
IRQB Mode 
| (Reserved) 


Channel 6 IPL 
Channel 5 IPL 
Channel 4 IPL 
Channel 3 IPL 
Channel 2 IPL 
Channel 1 IPL 
Channel 0 IPL 


* Indicates reserved bits, read as zero and should be written with zero for future compatibility 
AA0057 


Figure 7-2. Example Interrupt Priority Register 


In the example interrupt priority register IPR), shown in Figure 7-2, the interrupt for each on-chip 
peripheral device (channels 0-6) and for each external interrupt source (IRQA, IRQB), can be enabled or 
disabled under software control. The IPR also specifies the trigger mode of the external interrupt sources. 
Figure 7-3 shows how it might be programmed for different interrupts. 


Chx Enabled? IPL 
0 No — 
1 Yes 0 
IBL1 7 
He Enabled? IPL IALt Trigger Mode 
0 Level sensitive 
: “o = 1 Edge sensitive 
1 Yes 0 
AA0058 


Figure 7-3. Example On-Chip Peripheral and IRQ Interrupt Programming 


7.3.5 Interrupt Sources 


An interrupt request is a request to break out of currently executing code to enter an interrupt service 
routine. Interrupt requests in the DSP are generated from one of three sources: external hardware, internal 
hardware, and internal software. The internal hardware interrupt sources include all of the on-chip 
peripheral devices. 


Each interrupt source has at least one associated interrupt vector, and some sources may have several 
interrupt vectors. The interrupt vector addresses for each interrupt source are listed in the interrupt vector 
table (Table 7-4). These addresses are usually located in either the first 64 or 128 locations of program 
memory. For further information on a device’s on-chip peripheral interrupt sources, see the device’s 
individual user’s manual. 
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When an interrupt request is recognized and accepted by the DSP core, a two-word JSR instruction is 
fetched from the interrupt vector table. Because the program flow is directed to a different starting address 
within the table for each different interrupt, the interrupt structure can be described as “vectored.” A 
vectored interrupt structure has low execution overhead. If it is known beforehand that certain interrupts 
will not be used or enabled, those locations within the table can instead be used for program or data 
storage. 


7.3.5.1 External Hardware Interrupt Sources 


The external hardware interrupt sources are listed below: 

¢ RESET pin 

¢ IRQA pin—priority level 0 

¢ IRQB pin—priority level 0 
An assertion of the RESET is not truly an interrupt, but rather it forces the chip into the reset processing 
state. Likewise, for any DSP chip that contains a COP timer, a time-out on this timer can also place the 


chip into the reset processing state. The reset processing state is at the highest priority and takes 
precedence over any interrupt, including an interrupt in progress. 


Assertions on the IRQA and IRQB pins generate IRQA and IRQB interrupts, which are priority level 0 
interrupts and are individually maskable. The IRQA and IRQB interrupt pins are internally synchronized 
with the processor’s internal clock and can be programmed as level-sensitive or edge-sensitive. 


Edge-sensitive interrupts are latched as pending when a falling edge is detected on an IRQ pin. The IRQ 
pin’s interrupt-pending bit remains set until its associated interrupt is recognized and serviced by the DSP 
core. Edge-sensitive interrupts are automatically cleared when the interrupt is recognized and serviced by 
the DSP core. In an edge-sensitive interrupt the interrupt-pending bit is automatically cleared when the 
second vector location is fetched. 


Level-sensitive interrupts, on the other hand, are never latched but go directly into the interrupt controller. 
A level-sensitive interrupt is examined and processed when the IRQ pin is low and the interrupt arbiter 
allows this interrupt to be recognized. Since there is no interrupt-pending bit associated with 
level-sensitive interrupts, the interrupt cannot not be cleared automatically when serviced; instead, it must 
be explicitly cleared by other means to prevent multiple interrupts. 


NOTE: 


On all level-sensitive interrupts, the interrupt must be externally released 
before interrupts are internally re-enabled. Otherwise, the processor will 
be interrupted repeatedly until the release of the level-sensitive interrupt. 


When either the IRQA or IRQB pin is disabled in the IPR, any interrupt request on its associated pin is 
ignored, regardless of whether the input was defined as level-sensitive or edge-sensitive. If the interrupt 
input is defined as edge-sensitive, its interrupt-pending bit will remain in the reset state for as long as the 
interrupt pin is disabled. If the interrupt is defined as level-sensitive, its edge-detection latch will stay in the 
reset state. If the level-sensitive interrupt is disabled while it is pending, it will be cancelled. However, if 
the interrupt has been fetched, it normally will not be cancelled. 


The level-sensitive interrupt capability is useful for the case where there is more than one external interrupt 
source, yet only one IRQ pin is available. In this case the interrupts are wire ORed onto a single IRQ pin 
with a resistor pull-up, and any one of these can assert an interrupt. It is important that the interrupt service 
routine poll each device, and, after finding the source of the interrupt, it must clear the conditions causing 
the interrupt request. 
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7.3.5.2 DSP Core Hardware Interrupt Sources 


Other interrupt sources include the following: 
e Stack error interrupt—priority level 1 
¢ OnCE trap—priority level 1 
e All on-chip peripherals (such as timers and serial ports)—priority level 0 


An overflow of the hardware stack (HWS) causes a stack overflow interrupt that is vectored to P:$000A 
(see Section 5.1.7, “Hardware Stack,” on page 5-6). Encountering the stack overflow condition means that 
too many DO loop addresses have been stacked and that the oldest top-of-loop address has been lost. The 
stack error is non-recoverable. The stack error condition refers to hardware stack overflow and does not 
affect the software stack pointed to by the stack pointer (SP) register in any manner. 


The OnCE trap interrupt is an interrupt that can be setup in the OnCE debug port accessible through the 
JTAG pins. This gives the debug port the capability to generate an interrupt on a trigger condition such as 
the matching of an address in the OnCE port (see Section 9.3, “OnCE Port,” on page 9-4 for more 
information). 


In addition to these sources there are seven general-purpose interrupt channels, ChO through Ch6, available 
for use by on-chip peripherals such as timers and serial ports. Each channel can independently generate an 
interrupt request, each can be individually masked, and each channel can have one or more dedicated 
locations in the interrupt vector table. Typically, one channel is assigned to each on-chip peripheral, but, in 
cases where there are more than seven peripherals that can generate interrupts, it is possible to put more 
than one peripheral on a single interrupt channel. 


7.3.5.3 DSP Core Software Interrupt Sources 


The two software interrupt sources are listed below: 
¢ Software interrupt (SWI)—priority level 1 
¢ Illegal instruction interrupt (Ill)— priority level 1 


A SWIis a non-maskable interrupt that is serviced immediately following the SWI instruction execution 
(that is, no other instructions are executed between the SWI instruction and the JSR instruction found in 
the interrupt vector table). The difference between an SWI and a JSR instruction is that the SWI sets the 
interrupt mask to prevent level O—maskable interrupts from being serviced. The SWI’s ability to mask out 
lower-level interrupts makes it very useful for setting breakpoints in monitor programs or for making a 
system call in a simple operating system. The JSR instruction does not affect the interrupt mask. 


The illegal instruction interrupt is also a non-maskable interrupt (priority level 1). It is serviced 
immediately following the execution or attempted execution of an illegal instruction (an undefined 
operation code). Illegal exceptions are fatal errors. The JSR located in the illegal instruction interrupt 
vector will stack the address of the instruction immediately after the illegal instruction. 
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Main Interrupt 
Program Service Routine 
Fetches Fetches 


P| 
pe + 


(a) Instruction Fetches from Memory 


Illegal Instruction Interrupt 
Recognized as Pending 


Interrupt Control Cycle 1 i 

Interrupt Control Cycle 2 i 

Fetch n1 n2 n3 n4 Il n6 cat = iit ii2 ii3 ii4 ii5 

Decode ni n2 n3 n4 I —_ —_ _— ii ii2 ii3 ii4 

Execute ni n2 n3 n4_ }|NOP} — = = iit ii2 ii3 
[instruction CycleCount | 1 | 2) 3 14 )5)6/)7 181] 9 |10) 11] 12) 13) 14/1 
i = Interrupt 


ii = Interrupt Instruction Word 
Il = Illegal Instruction 
n = Normal Instruction Word 


(b) Program Controller Pipeline AA0059 
Figure 7-4. Illegal Instruction Interrupt Servicing 


This interrupt can be used as a diagnostic tool to allow the programmer to examine the stack and locate the 
illegal instruction, or the application program can be restarted with the hope that the failure was a soft 
error. The ILLEGAL instruction, found in Appendix A, “Instruction Set Details,“ is useful for testing the 
illegal interrupt service routine to verify that it can recover correctly from an illegal instruction. Note that 
the illegal instruction trap does not fire for all invalid opcodes. 


7.3.6 Interrupt Arbitration 


Interrupt arbitration and control, which occurs concurrently with the fetch-decode-execute cycle, takes two 
instruction cycles. External interrupts are internally synchronized with the processor clock before their 

interrupt-pending flags are set. Each external and internal interrupt has its own flag. After each instruction 
is executed, the DSP arbitrates all interrupts. During arbitration, each pending interrupt’s IPL is compared 
with the interrupt mask in the SR, and the interrupt is either allowed or disallowed. The remaining pending 
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interrupts are prioritized according to the IPLs shown in Table 7-7, and the interrupt source with the 
highest priority is selected. The interrupt vector corresponding to that source is then placed on the program 
address bus so that the program controller can fetch the interrupt instruction. 


Table 7-7. Fixed Priority Structure Within an IPL 


Priority Exception Enabled By 


Level 1 (Non-maskable) 


Highest Hardware RESET _ 


Watchdog timer reset = 


Illegal instruction — 


HWS overflow — 


OnCE trap — 


Lower SWI a 


Level 0 (Maskable) 


Higher TRQA (external interrupt) IPR bit 1 
IRQB (external interrupt) IPR bit 4 
Channel 6 peripheral interrupt IPR bit 9 
Channel 5 peripheral interrupt IPR bit 10 
Channel 4 peripheral interrupt IPR bit 114 
Channel 3 peripheral interrupt IPR bit 12 
Channel 2 peripheral interrupt IPR bit 13 
Channel 1 peripheral interrupt IPR bit 14 

Lowest Channel 0 peripheral interrupt IPR bit 15 


Interrupts from a given source are not buffered. The processor will not arbitrate a new interrupt from the 
same source until after it fetches the second word of the interrupt vector of the current interrupt. 


An internal interrupt-acknowledge signal clears the appropriate interrupt-pending flag for DSP core 
interrupts. Some peripheral interrupts may also be cleared by the internal interrupt-acknowledge signal, as 
defined in their specifications. Peripheral interrupt requests that need a read/write action to some register 
do not receive the internal interrupt-acknowledge signal, and their interrupt request will remain pending 
until their registers are read/written. Further, if the interrupt comes from an IRQ pin and is programmed as 
level triggered, the interrupt request will not be cleared. The acknowledge signal will be generated after the 
interrupt vectors have been generated, not before. 


If more than one interrupt is pending when an instruction is executed, the processor will first service the 
interrupt with the highest priority level. When multiple interrupt requests with the same IPL are pending, a 
second fixed-priority structure within that IPL determines which interrupt the processor will service. For 
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two interrupts programmed at the same priority level (non-maskable or level 0), Table 7-7 shows the 
exception priorities within the same priority level. The information in this table only applies when two 
interrupts arrive simultaneously or where two interrupts are simultaneously pending. 


Whenever a level 0 interrupt has been recognized and exception processing begins, the DSP56800 
interrupt controller changes the interrupt mask bits in the program controller’s SR to allow only level 1 
interrupts to be recognized. This prevents another level 0 interrupt from interrupting the interrupt service 
routine in progress. If an application requires that a level 0 interrupt can interrupt the current interrupt 
service routine, it is necessary to use one of the techniques discussed in Section 8.10.1, “Setting Interrupt 
Priorities in Software,” on page 8-30. 


7.3.7 The Interrupt Pipeline 


The interrupt controller generates an interrupt instruction fetch address, which points to the second 
instruction word of a two-word JSR instruction located in the interrupt vector table. This address is used 
instead of the PC for the next instruction fetch. While the interrupt instructions are being fetched, the PC is 
loaded with the address of the interrupt service routine contained within the JSR instruction. After the 
interrupt vector has been fetched, the PC is used for any subsequent instruction fetches and the interrupt is 
guaranteed to be executed. 


Upon executing the JSR instruction fetched from the interrupt vector table, the processor enters the 
appropriate interrupt service routine and exits the exception processing state. The instructions of the 
interrupt service routine are executed in the normal processing state and the routine is terminated with an 
RTI instruction. The RTI instruction restores the PC to the program originally interrupted and the SR to its 
contents before the interrupt occurred. Then program execution resumes. Figure 7-5 shows the interrupt 
service routine. The interrupt service routine must be told to return to the main program by executing an 
RTI instruction. 


The execution of an interrupt service routine always conforms to the following rules: 


1. A JSR to the starting address of the interrupt service routine is located at the first of two 
interrupt vector addresses. 


2. The interrupt mask bits of the SR are updated to mask level 0 interrupts. 


The first instruction word of the next interrupt service (of higher IPL) will reach the decoder 
only after the decoding of at least four instructions following the decoding of the first 
instruction of the previous interrupt. 


4. The interrupt service routine can be interrupted (that is, nested interrupts are supported). 


5. The interrupt routine, which can be any length, should be terminated by an RTI, which 
restores the PC and SR from the stack. 
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Interrupt Interrupt 
Vector Table Subroutine 


Main 
Program 


Interrupt PC Resumes 
Synchronized JSR Operation 
and Jump Address 
| Interrupts 
Recognized 
: Re-enabled 
as Pending 
Explicit 
Return From 
Interrupt 
(Should Be RTI) 
(a) Instruction Fetches from Memory 
Interrupt Synchronized and 
Recognized as Pending 
[ Interrupts Re-enabled 

Interrupt Control Cycle 1 | i 
Interrupt Control Cycle 2 i 
Fetch ni} n2)] — | Adr} — | ii2 ii3 | ii4 | 5 | iin | RTI n2 
Decode ni |JSR|JSR|JSR}|JSR{ ii2 | ii3 | ii4 | ii5 | iin | RTI] RTI} RTI} RTI] RTT} n2 | — 
Execute ni |JSR/|JSR|JSR| JSR] ii2 | ii3 | ii4 | ii | iin | RTL} RTI} RTI) RTI} RTI) n2 
Instruction Cycle Count 1 2 3 4 5 6 7 8/9 ]10/ 11} 12] 13] 14 | 15] 16] 17 | 18 


i = Interrupt 
ii = Interrupt Instruction Word 
n = Normal Instruction Word 


(b) Program Controller Pipeline AA0069 


Figure 7-5. Interrupt Service Routine 


Figure 7-5 demonstrates the interrupt pipeline. The point at which interrupts are re-enabled and subsequent 
interrupts are allowed is shown to illustrate the non-interruptible nature of the early instructions in the long 
interrupt service routine. 


Reset is a special exception, which will normally contain only a JMP instruction at the exception start 
address. 


There is only one case in which the stacked address will not point to the illegal instruction. If the illegal 
instruction follows an REP instruction (see Figure 7-6), the processor will effectively execute the illegal 
instruction as a repeated NOP, and the interrupt vector will then be inserted in the pipeline. The next 
instruction will be fetched, decoded, and executed normally. 
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Illegal Instruction Interrupt 
Recognized as Pending 


| 


Interrupt Control Cycle 1 i 

Interrupt Control Cycle 2 i 

Fetch ni | n2 |} n3} n4 | REP | n6 n7 _— — | — |} iit | ii2 | n8 

Decode nt] n2}n3} n4 |REP} Il ii1 | ii2 | n8 
Execute ni | n2} n3 n4_|REP|}REP|REP| Il | — | — | iit | ii2 | n8 
[instruction CycleCount | 11]2)/3 14/5) 6 |7 | 8) 9 | 10111) 12)|13| 14) 15] 16 | 
i = Interrupt 


ii = Interrupt Instruction Word 
Il = Illegal Instruction 
n = Normal Instruction Word AA0070 


Figure 7-6. Repeated Illegal Instruction 


In DO loops, if the illegal instruction is in the loop address (LA) location and the instruction preceding it 
(that is, at LA-1) is being interrupted, the loop counter (LC) will be decremented as if the loop had reached 
the LA instruction. When the interrupt service ends and the instruction flow returns to the loop, the 
instruction after the illegal instruction will be fetched (since it is the next sequential instruction in the 
flow). 


7.3.8 Interrupt Latency 


Interrupt latency represents the time between when an interrupt request first appears and when the first 
instruction in an interrupt service routine is actually executed. The interrupt can only take place on 
instruction boundaries, and so the length of execution of an instruction affects interrupt latency. 


There are some special cases to consider. The SWI, STOP, and WAIT instructions are not interruptible. 
Likewise, the REP instruction and the instruction it repeats are not interruptible. 


A REP instruction and the instruction that follows it are treated as a single two-word instruction, regardless 
of how many times it repeats the second instruction of the pair. Instruction fetches are suspended and will 
be reactivated only after the LC is decremented to one (see Figure 7-7). During the execution of n2 in 
Figure 7-7, no interrupts will be serviced. When LC finally decrements to one, the fetches are re-initiated, 
and pending interrupts can be serviced. 
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Interrupt Main 
Synchronized and Program 
Recognized 
as Pending Fetches 


Instruction N2 
Replaced Per 
The REP Instruction 


N¢ 


Interrupts 
Re-enabled 


Interrupt 
Service Routine Fetches 
(From Between P:$0000 And 
P:$003F) 


i= Interrupt Instruction 
n= Normal Instruction 


(a) Instruction Fetches from Memory 


Interrupt Synchronized and 
Recognized as Pending 


I Interrupts Re-enabled 


Interrupt Control Cycle 1 i i 
Interrupt Control Cycle 2 i% i 
Fetch REP n2 n3 ii1 ii2 nd n6 
Decode REP | REP | REP n2 n2 n2 n2 JSR | JSR | JSR |} JSR 
Execute REP | REP | REP n2 n2 n2 n2 JSR | JSR | JSR 
Instruction Cycle Count 1 2 3 4 5 6 ri 8 9 10 11 12 
i = Interrupt 
ii = Interrupt Instruction Word 
n = Normal Instruction Word 
i% = Interrupt Rejected 

(b) Program Controller Pipeline AAOO71 


Figure 7-7. Interrupting a REP Instruction 


7.4 Wait Processing State 


The WAIT instruction brings the processor into the wait processing state, which is one of two low 
power-consumption states. Asserting any valid interrupt request higher than the current processing level 
(as defined by the I1 and I0 bits in the status register) releases the DSP from the wait state. In the wait state 
the internal clock is disabled from all internal circuitry except the internal peripherals. All internal 


processing is halted until an unmasked interrupt occurs or until the DSP is reset. 
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Figure 7-8 shows a wait instruction being fetched, decoded, and executed. It is fetched as n3 in this 
example and, during decode, is recognized as a wait instruction. The following instruction (n4) is aborted, 
and the internal clock is disabled from all internal circuitry except the internal peripherals. The processor 
stays in this state until an interrupt or reset is recognized. The response time is variable due to the timing of 
the interrupt with respect to the internal clock. 


Interrupt Synchronized and 
Recognized as Pending 


Interrupt Control Cycle 1 i 


Interrupt Control Cycle 2 i 


Fetch n3 n4 _— ii ii2 | ii3 ii4 ii5 i6 | n4 

Decode n2 |WAIT| — ii ii2 ii3 ii4 i5 | iié | n4 
Execute ni n2_ | WAIT ii ii2 ii3 ii4 | ii5 | ii6 | n4 
Instruction Cycle Count 1 2 3 5 6 7 8 9 10 11 12 13 | 14 | 15 
i = Interrupt 

ii = Interrupt Instruction Word 

n = Normal Instruction Word Only Internal Peripherals 


Receive Clock AA0074 


Figure 7-8. Wait Instruction Timing 


Figure 7-8 shows the result of an interrupt bringing the processor out of the wait state. The two appropriate 
interrupt vectors are fetched and put in the instruction pipe. The next instruction fetched is n4, which had 
been aborted earlier. Instruction execution proceeds normally from this point. 


Figure 7-9 shows an example of the wait instruction being executed at the same time that an interrupt is 
pending. Instruction n4 is aborted, as in the preceding example. The wait instruction causes a 
five-instruction-cycle delay from the time it is decoded, after which the interrupt is processed normally. 
The internal clocks are not turned off, and the net effect is that of executing eight NOP instructions 
between the execution of n2 and iil. 


Interrupt Synchronized and 
Recognized as Pending 


Interrupt Control Cycle 1 i 
Interrupt Control Cycle 2 i 
Fetch n3 n4 iit ii2 ii3 
Decode n2 WAIT ii1 ii2 
Execute ni n2 WAIT ii 
[instruction CycleCount | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 4 
i= Interrupt XN / 
ii= Interrupt Instruction Word 
n= Normal Instruction Word Equivalent to Eight NOPs AA0075 


Figure 7-9. Simultaneous Wait Instruction and Interrupt 
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7.5 Stop Processing State 


The STOP instruction brings the processor into the stop processing state, which is the lowest 
power-consumption state. In the stop state the clock oscillator is gated off, whereas in the wait state the 
clock oscillator remains active. The chip clears all peripheral interrupts and external interrupts IRQA, 
IRQB, and NMI) when it enters the stop state. Stack errors that were pending remain pending. The priority 
levels of the peripherals remain as they were before the STOP instruction was executed. The on-chip 
peripherals are held in their respective, individual reset states while the processor is in the stop state. 


The stop processing state halts all activity in the processor until one of the following actions occurs: 
¢ A low level is applied to the IRQA pin 
¢ A low level is applied to the RESET pin 


e« An on-chip timer reaches zero 


Any of these actions will activate the oscillator, and after a clock stabilization delay, clocks to the 
processor and peripherals will be re-enabled. The clock-stabilization delay period is equal to either 16 (T) 
cycles or 131,072 T cycles as determined by the stop delay (SD) bit in the OMR. One T cycle is equal to 
one half of a clock cycle. For example, according to Table 6-33 on page 6-28, one NOP instruction 
executes in 2 clock cycles; therefore, one NOP instruction executes in 4T cycles, i.e., 1 instruction cycle 
equals 2 clock cycles and is equal to 4T cycles. 


The stop sequence is composed of eight instruction cycles called stop cycles. They are differentiated from 
normal instruction cycles because the fourth cycle is stretched for an indeterminate period of time while 
the four-phase clock is turned off. 


As shown in Figure 7-10, the STOP instruction is fetched in stop cycle 1, decoded in stop cycle 2 (which is 
where it is first recognized as a stop command), and executed in stop cycle 3. The next instruction (n4) is 

fetched during stop cycle 2 but is not decoded in stop cycle 3 because, by that time, the STOP instruction 

prevents the decode. The processor stops the clock and enters the stop mode. The processor will stay in the 
stop mode until it is restarted. 


IRQA 
Fetch n3 n4 n4 
Decode n2 |STOP 


Execute ni n2 |STOP|STOP/;/STOP| — 


Stop Cycle Count 1 2 3 4 5 6 N 7 8 9 10 11 12 (13) 


IRQA = Interrupt Request A Signal 
n = Normal Instruction Word Resume Stop Cycle Count 4, 


STOP = Interrupt Instruction Word Interrupts Enabled 


131,072 T or 16 T 
Cycle Count Started 


Clock Stopped 
AA0076 


Figure 7-10. STOP Instruction Sequence 


Figure 7-11 shows the system being restarted through asserting the IRQA signal. If the exit from the stop 
state was caused by a low level on the IRQA pin, then the processor will service the highest priority 
pending interrupt. If no interrupt is pending, then the processor resumes at the instruction following the 
STOP instruction that brought the processor into the stop state. 
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IRQA 

Fetch n3 n4 iif 
Decode n2 | STOP 

Execute n1 n2 |STOP|STOP|STOP}| — 

Stop Cycle Count 1 2 3 4 5 7 8 9 10 11 12 (13) 


6 
IRQA = Interrupt Request A Signal 
n = Normal Instruction Word Resume Stop Cycle Count 4, 
STOP = Interrupt Instruction Word Interrupts Enabled 
Clock Stopped 


131,072 T or 16 T 


Cycle Count Started 
AA0077 


Figure 7-11. STOP Instruction Sequence 
An IRQA deasserted before the end of the stop cycle count will not be recognized as pending. If IRQA is 


asserted when the stop cycle count completes, then an IRQA interrupt will be recognized as pending and 
will be arbitrated with any other interrupts. 


Specifically, when IRQA is asserted, the internal clock generator is started and begins a delay determined 
by the SD bit of the OMR. When the chip uses the internal clock oscillator, the SD bit should be set to zero 
to allow a longer delay time of 128K T cycles (131,072 T cycles), so that the clock oscillator may stabilize. 
When the chip uses a stable external clock, the SD bit may be set to one to allow a shorter (16 T cycle) 
delay time and a faster startup of the chip. 


For example, assume that the SD equals 0 so that the 128K T counter is used. During the 128K T count the 
processor ignores interrupts until the last few counts and, at that time, begins to synchronize them. At the 
end of the 128K T cycle delay period, the chip restarts instruction processing, completes stop cycle 4 
(interrupt arbitration occurs at this time), and executes stop cycles 5, 6, 7, and 8. (It takes 17 T from the end 
of the 128K T delay to the first instruction fetch.) If the IRQA signal is released (pulled high) after a 
minimum of 4T but after fewer than 128K T cycles, no IRQA interrupt will occur, and the instruction 
fetched after stop cycle 8 will be the next sequential instruction (n4 in Figure 7-10). An IRQA interrupt 
will be serviced as shown in Figure 7-11 if the following conditions are true: 


1. The IRQA signal had previously been initialized as level sensitive. 


2. IRQA is held low from the end of the 128K T cycle delay counter to the end of stop cycle 
count 8. 


3. No interrupt with a higher interrupt level is pending. 


If IRQA is not asserted during the last part of the STOP instruction sequence (6, 7, and 8) and if no 
interrupts are pending, the processor will refetch the next sequential instruction (n4). Since the IRQA 
signal is asserted, the processor will recognize the interrupt and fetch and execute the JSR instruction 
located at P:$0010 and P:$0011 (the IRQA interrupt vector locations). 


To ensure servicing IRQA immediately after leaving the stop state, the following steps must be taken 
before the execution of the STOP instruction: 


1. Define IRQA as level sensitive; an edge-triggered interrupt will not be serviced. 


Ensure that no stack error is pending. 


2 
3. Execute the STOP instruction and enter the stop state. 
4 


Recover from the stop state by asserting the IRQA pin and holding it asserted for the entire 
clock recovery time. If it is low, the IRQA vector will be fetched. 
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5. The exact elapsed time for clock recovery is unpredictable. The external device that asserts 
IRQA must wait for some positive feedback, such as specific memory access or a change 
in some predetermined I/O pin, before deasserting IRQA. 


The STOP sequence totals 131,104 T cycles (if the SD equals 0) or 48 T cycles (if the SD equals 1) in 
addition to the period with no clocks from the stop fetch to the IRQA vector fetch (or next instruction). 
However, there is an additional delay if the internal oscillator is used. An indeterminate period of time is 
needed for the oscillator to begin oscillating and then stabilize its amplitude. The processor will still count 
131,072 T cycles (or 16 T cycles), but the period of the first oscillator cycles will be irregular; thus, an 
additional period of 19,000 T cycles should be allowed for oscillator irregularity (the specification 
recommends a total minimum period of 150,000 T cycles for oscillator stabilization). If an external 
oscillator is used that is already stabilized, no additional time is needed. 


The PLL may or may not be disabled when the chip enters the stop state. If it is disabled and will not be 
re-enabled when the chip leaves the stop state, the number of T cycles will be much greater because the 
PLL must regain lock. 


If the STOP instruction is executed when the IRQA signal is asserted, the clock generator will not be 
stopped, but the four-phase clock will be disabled for the duration of the 128K T cycle (or 16 T cycle) 
delay count. In this case the STOP instruction looks like a 131,072 T + 35 T cycle (or 51 T cycle) NOP, 
since the STOP instruction itself is eight instruction cycles long (32 T) and synchronization of IRQA is 3 
T, totaling 35 T. 


A stack error interrupt that is pending before the processor enters the stop state is not cleared and will 
remain pending. During the clock-stabilization delay in stop mode, any edge-triggered IRQ interrupts are 
cleared and ignored. 


If RESET is used to restart the processor (see Figure 7-12), the 128K T cycle delay counter would not be 
used, all pending interrupts would be discarded, and the processor would immediately enter the reset 
processing state as described in Section 7.1, “Reset Processing State.” For example, the stabilization time 
recommended in DSP56824 Technical Data for the clock (RESET should be asserted for this time) is only 
50 T for a stabilized external clock, but is the same 150,000 T for the internal oscillator. These stabilization 
times are recommended and are not imposed by internal timers or time delays. The DSP fetches 
instructions immediately after exiting reset. If the user wishes to use the 128K T (or 16 T) delay counter, it 
can be started by asserting IRQA for a short time (about two clock cycles). 


RESET 


\I 


Processor Enters 


Reset State Processor Leaves Reset State 
Interrupt Control Cycle 1 
Interrupt Control Cycle 2 
Fetch n3 n4 = = nop nA nB nc nD nE 
Decode n2 STOP — = nop nop nA nB nc nD 
Execute ni n2 STOP a nop nop nop nA nB nc 
Stop Cycle Count 1 2 3 4 N 
RESET= Interrupt 
n = Normal Instruction Word Clock Stopped 
nA, nB, nC = Instructions in Reset Routine 
STOP = Interrupt Instruction Word AA0078 


Figure 7-12. STOP Instruction Sequence Recovering with RESET 
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7.6 Debug Processing State 


The debug processing state is a state where the DSP core is halted and under the control of the OnCE 
debug port. Serial data is shifted in and out of this port, and it is possible to execute single instructions 
from this processing state. The debug processing state and the operation of the OnCE port is covered in 
more detail in Chapter 9, “JTAG and On-Chip Emulation (OnCE™).” 
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Different software techniques can be used to fully exploit the DSP56800 architecture’s resources and 
enhance its features. For example, small sequences of DSP56800 instructions can emulate more powerful 
instructions. This chapter discusses how better performance can be obtained from the DSP56800 
architecture using software techniques. The following topics are covered: 


¢ Synthesizing useful new instructions 

¢ Techniques for shifting 16- and 32-bit values 
¢ Incrementing and decrementing 

e Division techniques 

e Pushing variables onto the software stack 

¢ Different looping and nested-looping techniques 
¢ Different techniques for array indexing 

¢ Parameter passing and local variables 

¢ Freeing up registers for time-critical loops 

¢ Interrupt programming 

e Jumps and JSRs using a register value 

¢ Freeing one hardware stack (HWS) location 
¢ Miulti-tasking and the HWS 


8.1 Useful Instruction Operations 


The flexible instruction set of the DSP56800 architecture allows new instructions to be synthesized from 
existing DSP56800 instructions. This section presents some of these useful operations that are not directly 
supported by the DSP56800 instruction set, but can be efficiently synthesized. Table 8-1 lists operations 
that can be synthesized using DSP56800 instructions. 


Table 8-1. Operations Synthesized Using DSP56800 Instructions 


Operation Description 
JRSET, JRCLR Jumps if all selected bits in bit field is set or clear 
BR1SET, BR1CLR Branches if at least one selected bit in bit field is set or clear 
JR1SET, JR1CLR Jumps if at least one selected bit in bit field is set or clear 
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Table 8-1. Operations Synthesized Using DSP56800 Instructions (Continued) 


Operation Description 


JVS, JVC, BVS, BVC Jumps or branches if the overflow bit is set or clear 


JPL, JMI, JES, JEC, JLMS, JLMC, Jumps or branches on other condition codes 
BPL, BMI, BES, BEC, BLMS, BLMC 


NEGW Negates of upper two registers of an accumulator 

NEG Negates another data ALU register, an AGU register, or a memory location 
XCHG Exchanges any two registers 

MAX Returns the maximum of two registers 

MIN Returns the minimum of two registers 

Accumulator sign extend Sign extends the accumulator into the A2 or B2 portion 

Accumulator unsigned load Zeros the accumulator LSP and extension register 


8.1.1 Jumps and Branches 


Several operations for jumping and branching can be emulated, depending on selected bits in a bit field, 
overflows, or other condition codes. 


8.1.1.1 JRSET and JRCLR Operations 


The JRSET and JRCLR operations are very similar to the BRSET and BRCLR instructions. They still test 
a bit field and go to another address if all masked bits are either set or cleared. The BRSET and BRCLR 
instructions only allow branches of 64 locations away from the current instruction and can only test an 
8-bit field; however, JRSET and JRCLR operations allow jumps to anywhere in the 64K-word program 
address space, and can specify a 16-bit mask. The following code shows that these two operations allow 
the same addressing modes as the BFTSTH and BFTSTL instructions. 


Example 8-1. JRSET and JRCLR 


; JRSET Operation 

; Emulated in 5 Icyc (4 Icyc if false), 4 Instruction Words 

BFTSTH #XXXX, X:<ea> ; 16-bit mask allowed 

JCS label ; 16-bit jump address allowed 


; JRCLR Operation 
; Emulated in 5 Icyc (4 Icyc if false), 4 Instruction Words 
BFTSTL #XXXX,X2<ea> ; 16-bit mask allowed 
JCS label ; 16-bit jump address allowed 
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8.1.1.2 BR1SET and BR1CLR Operations 


The BRISET and BRICLR operations are very similar to the BRSET and BRCLR instructions. They still 
test a bit field and branch to another address based on the result of some test. The difference is that for 
BRSET and BRCLR the condition is true if all selected bits in the bit field are 1s or Os, respectively, 
whereas for BRISET and BRICLR the condition is true if at least one of the selected bits in the bit field is 
a 1 or 0, respectively. BRISET and BRICLR operations can also specify a 16-bit mask, compared to an 
8-bit mask for BRSET and BRCLR. The following code shows that these two operations allow the same 
addressing modes as the BFTSTH and BFTSTL instructions. 


Example 8-2. BR1SET and BRiCLR 


; BRISET Operation 

; Emulated in 5 Icyc (4 Icyc if false), 3 Instruction Words 

BFTSTL #XXXX, X:<ea> ; 16-bit mask allowed 

BCC label ; 7T-bit signed PC relative offset allowed 


; BRICLR Operation 

; Emulated in 5 Icyc (4 Icyc if false), 3 Instruction Words 

BFTSTH #XXXX, X:<ea> ; 16-bit mask allowed 

BCC label ; 7T-bit signed PC relative offset allowed 


8.1.1.3  JR1SET and JR1CLR Operations 


The JRISET and JR1CLR operations are very similar to the JRSET and JRCLR instructions. They still test 
a bit field and jump to another address based on the result of some test. The difference is that for JRSET 
and JRCLR the condition is true if all selected bits in the bit field are 1s or Os, respectively, whereas for 
JR1ISET and JR1CLR the condition is true if at least one of the selected bits in the bit field is a 1 or 0, 
respectively. JRISET and JR1CLR operations allow jumps to anywhere in the 64K-word program address 
space, and can specify a 16-bit mask. The following code shows that these two operations allow the same 
addressing modes as the BFTSTH and BFTSTL instructions. 


Example 8-3. JR1SET and JR1CLR 


; JR1ISET Operation 

; Emulated in 5 Icyc (4 Icyc if false), 4 Instruction Words 

BFTSTL #XXXX, X:<ea> ; 16-bit mask allowed 

JCC label ; 16-bit jump address allowed 


; JRICLR Operation 

; Emulated in 5 Icyc (4 Icyc if false), 4 Instruction Words 

BFTSTH #XXXX, X:<ea> ; 16-bit mask allowed 

JCC label ; 16-bit jump address allowed 
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8.1.1.4 JVS, JVC, BVS, and BVC Operations 


Although there is no instruction for jumping or branching on overflow, such an operation can be emulated 
as shown in the following code. Note that the carry bit will be destroyed by this operation since it receives 
the result of the BFTSTH instruction. The following code shows JVS and BVC. 


Example 8-4. JVS, JVC, BVS and BVC 


; JVS Operation 

; Emulated in 5 Icyc (4 Icyc if false), 4 Instruction Words 

BFTSTH #$0002,SR ; Test V bit in SR 

JCS label ; 16-bit jump address allowed 


; BVC Operation 

; Emulated in 5 Icyc (4 Icyc if false), 3 Instruction Words 

BFTSTH #$0002,SR ; Test V bit in SR 

BCC label ; 7T-bit signed PC relative offset allowed 


8.1.1.5 Other Jumps and Branches on Condition Codes 


Jumping and branching using some of the other condition codes (PL, MI, EC, ES, LC, LS) can be 
accomplished in the same manner as for overflow; see Section 8.1.1.4, “JVS, JVC, BVS, and BVC 
Operations.” Remember that this technique destroys the value in the carry bit. The following code shows 
JPL and BES. 


Example 8-5. JPL and BES 


; JPL Operation 

; Emulated in 5 Icyc (4 Icyc if false), 4 Instruction Words 

BFTSTH #$0008,SR ; Test the N bit in SR 

JCC label ; 16-bit jump address allowed 


; BES Operation 

; Emulated in 5 Icyc (4 Icyc if false), 3 Instruction Words 

BFTSTH #$0020,SR ; Test E bit in SR 

BCS label ; 7-bit signed PC relative offset allowed 


Similar code can be written for JMI, JEC, JES, JUMC, JLMS, BPL, BMI, BEC, BLMC, and BLMS. The 
JLMS and JLMC are used for “jump if limit set” and “jump if limit clear,” respectively; this is done to 
avoid any confusion with the JLS (“jump if lower or same”) instruction. 


8.1.2 Negation Operations 


The NEGW operation can be used to negate the upper two registers of the accumulator. The NEG 
operation can be used to negate the XO, YO, or Y1 data ALU registers, negate an AGU register, or negate a 
memory location. 


8.1.2.1 NEGW Operation 


The NEGW operation can be emulated as shown in the following code: 


; 20-bit NEGW Operation 

; Operates on EXT:MSP, Clears LSP, 3 Icyc 

MOVE #0, A0 ; Clear LSP 

NEG A ; Now negates upper 20 bits of accumulator 
; Since AO = 0 


This correctly negates the upper 20 bits of the accumulator, but also destroys the AO register. 
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The NEG instruction can be used directly, executing in one instruction cycle, in cases where it is already 
known that the least significant portion (LSP) of an accumulator is $0000. This is true immediately after a 
value is moved to the A or B accumulator from memory or a register, as shown in the following code: 


; Example of 1 Icyc NEGW Operation 
; Works because AO is already equal to $0000 


MOVE X:(RO),A ; Move a 16-bit value to an accumulator, 
; clearing AO register 
NEG A ; Now negates upper 20 bits of accumulator 


; Since AO = 0 


The technique shown in the following code can be used for cases when 16-bit data is being processed and 
when it can be guaranteed that the LSP or extension register of the accumulator contains no required 
information: 


; 16-bit NEGW Operation 
; Operates on MSP, Forces EXT to sign extension, LSP to $0, 2 Icyc 


MOVE Al,A ; Force A2 to sign extension, 
; force AO cleared 
NEG A ; Now negates upper 20 bits of accumulator 


; Since AO = 0 


The following technique may be used for the case where the CC bit in the SR is set to a 1, the LSP may not 
be $0000, and the user is not interested in the values in the accumulator extension registers: 


; 16-bit NEGW Operation 
; CC bit must be set, operates on MSP, doesn’t affect AO, 2 Icyc 
NOT A ; One’s-complement of Al, A2 unchanged 
INCW A ; Increment to get two’s-complement, 
; A2 may be incorrect 


8.1.2.2 Negating the X0, YO, or Y1 Data ALU registers 


Although the NEG instruction is supported on accumulators only, NEG can be emulated to perform a 
negation of the data ALU’s XO, YO, or Y1 registers, as shown in the following code: 


; NEG Operation 

; Emulated at 2 Icyc 
NOT YO 
INCW YO 


8.1.2.3 Negating an AGU register 


It is possible to negate one of the AGU registers (Rn) without destroying any other register, as shown in the 
following code: 


; NEG Operation 

; Emulated at 3 Icyc 
NOTC RO 

LEA (RO) + 


8.1.2.4 Negating a Memory Location 


It is possible to negate a memory location, as shown in the following code: 


; NEG Operation 

7 Emulated at 5 Icyc 
NOTC X:$19 
INCW X:$19 


When an accumulator is available, it may be faster to do this operation simply by moving the value to an 
accumulator, performing the operation there, and moving the result back to memory. 
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8.1.3 Register Exchanges 


The XCHG operation can be emulated as shown in the following code: 


; XCHG Operation 
7 Emulated at 4 Icyc 


PUSH XO 
MOVE A, X0 
POP A 


If a register is available, the exchange of any two registers can be emulated as shown in the following code: 


; XCHG Operation 
; Emulated at 3 Icyc 


MOVE X0,N 
MOVE A,X0 
MOVE N,A 


A faster exchange of any two registers can be emulated using one address register when N equals 0, as 
shown in the following code: 


; XCHG Operation 

; N register is 0, Emulated at 2 Icyc 
MOVE A,X: (RO) 
TFR X0,A X: (RO) +N, XO 


8.1.4 Minimum and Maximum Values 


The MAX operation returns the maximum of two values; the MIN operation return the minimum. 


8.1.4.1 MAX Operation 


The MAX operation can be emulated as shown in the following code: 


; MAX Operation 
MAX X0,A 


fe eres becomes —---~-~— 


; MAX operation 
7 Emulated at 4 Icyc 
CMP X0,A 
TGT X0,A ; (can also use TGE if desired) 
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8.1.4.2 MIN Operation 


The MIN operation can be emulated as shown in the following code: 


; MIN Operation 
MIN YO,A 


PES becomes —----— 


; MIN Operation 
7 Emulated at 4 Icyc 
CMP YO,A 
TLT YO,A ; (can also use TLE if desired) 


8.1.5 Accumulator Sign Extend 


There are two versions of this operation. In the first, the accumulator only contains 16 bits of useful 
information in Al or B1, and it is necessary to sign extend into A2 or B2. In the second version, both Al 
and AO or B1 and BO contain useful information. The following code shows both versions: 


; Sign-Extension Operation of 16-bit Accumulator Data 
; Emulated in 1 Icyc, 1 Instruction Words 
MOVE Al,A ; Sign extend into A2, clear AO register 


; Sign-Extension Operation of 32-bit Accumulator Data 
; Emulated in 4 Icyc, 4 Instruction Words 


PUSH AO ; Save AO register 
MOVE Al,A ; Sign extend into A2, clear AO register 
POP AO ; Restore AO register to correct contents 


8.1.6 Unsigned Load of an Accumulator 


The unsigned load of an accumulator, which zeros the LSP and extension register, can be exactly emulated 
as shown in the following code: 


; DSP56100 Family Unsigned Load 
; Emulated at 2 Icyc 


MOVE x: (RO),A 
ZERO A 
Po qe = becomes —----— 


; DSP56800 Family Unsigned Load 
; Emulated at 2 Icyc 

CLR A 

MOVE x: (RO),Al1 


This operation is important for processing unsigned numbers when the CC bit in the operating mode 
register (OMR) register is a 0, so that the condition codes are set using information at bit 35. This operation 
is useful for performing unsigned additions and subtractions on 36-bit values. 
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8.2 16- and 32-Bit Shift Operations 


This technique presents many different methods for performing shift operations on the DSP56800 
architecture. Different techniques offer different advantages. Some techniques require several registers, 
while others can be performed only on the register to be shifted. It is even possible to shift the value in one 
register but place the result in a different register. Techniques are also presented for shifting 36-bit values 
by large immediate values. 


8.2.1 Small Immediate 16- or 32-Bit Shifts 


If itis only necessary to shift a register or accumulator by a small amount, one of the two techniques shown 
in the following code may be adequate. These techniques may also be appropriate if there are no registers 
available for use in the shifting operation, since more than one register is required with the multi-bit 
shifting instructions. For cases where the amount of bit positions to shift is larger than three for 16-bit 
registers or five for a 32-bit value, then it may be appropriate to use another technique. 


; First Technique - Shift an Accumulator by 3 Bits - Use Inline Code 


ASL A 
ASL A 
ASL A 


; Second Technique - Shift an Accumulator by 6 Bits - Use REP Loop 
REP #6 
ASL A 


For places in a program that are executed infrequently, the second technique of using a REP (or DO) loop 
results in the smallest code size. 


8.2.2 General 16-Bit Shifts 


For fast 16-bit shifting, the ASLL, ASRR, LSLL, and LSRR allow for single-cycle shifting of a 16-bit 
value where the shift count is specified by a register. If it is desired to shift by an immediate value, the 
immediate value must first be loaded into a register as shown in the following code: 


; Shifting a 16-Bit Value by an Immediate Value 
; Executes in 2 Icyc, 2 Instruction Words 
MOVE #7,X0 ; Load shift count into the X0 register 
ASLL YO, X0, YO ; Arithmetically shift the contents of YO 
; 7 bits to the left 


Note that these instructions clear the LSP of an accumulator. It is possible to perform a right shift where 
the bits shifted into the LSP of the accumulator are not lost. Instead of using the ASRR or LSRR 
instructions, a CLR instruction is first used to clear the accumulator, and then an ASRAC or LSRAC 
instruction is performed. This technique allows a 16-bit value to be right shifted into a 32-bit field, as 
shown in the following code: 


; Shifting a 16-bit Value into a 32-bit field 
; Executes in 2 Icyc, 2 Instruction Words 
CLR A ; Clear accumulator 
ASRAC YO,X0,A ; Arithmetically shift into a 32-bit field 
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8.2.3 General 32-Bit Arithmetic Right Shifts 


It is possible to perform right shifting of up to 15 bits on 32-bit values using the techniques presented in 
this section. 


The following example shows how to arithmetically shift the 32-bit contents of the Y1:Y0 registers, 
storing the results into the A accumulator. Note that this technique uses many of the data ALU registers: 
Y1 and YO to hold the value to be shifted, XO to hold the amount to be shifted, and the A accumulator to 
store the result. The following code allows shifts of 0 to 15 bits and executes in five instruction cycles. 


; Arithmetically Shift Y1:YO Register Combination by 8 bits 
; Emulated in 5 Icyc, 5 Instruction Words 


MOVE #8, X0 

LSRR YO,X0,A ; Logically shift lower word 

MOVE Al1,A0 ; 16-bit arithmetic right shift 

MOVE A2,Al1 

ASRA Y1,X0,A ; Arithmetically shift upper word and 


; combine with lower word 


If it is necessary to shift by more than 15 bits, then the following code should be preceded by a shift of 16 
bits, as documented later in this section. 


Similar code that follows shows how to arithmetically shift the 32-bit value in the A accumulator. Again, 
this technique takes several registers: Y1 to hold the most significant word (MSW) to be shifted and YO to 
hold the amount to be shifted. This, perhaps, is only useful when the amount to be shifted is a variable 
amount or when the amount to be shifted is eight or more and the Y1 and YO registers are available. Note 
that the extension register (A2) is not shifted in this case. 


; Arithmetically Shift A1:AO Accumulator by 11 bits 
; Emulated in 7 Icyc, 7 Instruction Words 


MOVE #11, Y0 

MOVE Al,Y1 ; Save copy of Al register (upper word 
; to be shifted) 

MOVE AO,Al 

LSRR Al,Y0,A ; Logically shift lower word 

MOVE Al1,A0 ; 16-bit arithmetic right shift 

MOVE A2,Al1 

ASRAC Y1,Y0,A ; Arithmetically shift upper word and 


; combine with lower word 


8.2.4 General 32-Bit Logical Right Shifts 


Right shifting logically is identical to right shifting arithmetically except for the final shift instruction. For 
arithmetic shifts of 32-bit values the final instruction is an ASRAC instruction, and for logical shifts of 
32-bit values the final instruction is a LSRAC instruction. This is shown in the following code: 


; Logically Shift Y1:YO Register Combination by 8 bits 
; Emulated in 5 Icyc, 5 Instruction Words 


MOVE #8, X0 

LSRR YO,X0,A ; Logically shift lower word 
MOVE Al1,A0 ; 16-bit arithmetic right shift 
MOVE A2,Al1 

LSRAC Y1,X0,A ; Logically shift upper word and 


; combine with lower word 
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8.2.5 Arithmetic Shifts by a Fixed Amount 


Arithmetic shifts (left or right) by a fixed amount can be emulated with the ASRxx operations. 


8.2.5.1 Right Shifts (ASR12—ASR20) 


For arithmetic right shifts there is a faster way to shift an accumulator for large shift counts. The following 
code shows how to perform arithmetic right shifts of 12 through 20 bits on an accumulator. This emulation 
works without destroying any registers on the chip. If desired, it is possible to use this technique for bit 
shifts greater than 20, but it is not possible to use this technique for shifts of 11 or fewer bits without losing 
information. 
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; ASR12 Operation 
; Emulated in 8 Icyc, 


8 Instruction Words 


16- and 32-Bit Shift Operations 


; (PUSH is a 2-word, 2 Icyc macro) 


ASL A 
ASL A 
ASL A 
ASL A 
PUSH Al 
MOVE A2,A 
POP AO 


; ASR13 Operation 
; Emulated in 7 Icyc, 


7 Instruction Words 


ASL A 

ASL A 

ASL A 

PUSH Al ; (PUSH is 
MOVE A2,A 

POP AO 


; ASR14 Operation 
; Emulated in 6 Icyc, 


6 Instruction Words 


ASL A 

ASL A 

PUSH Al ; (PUSH is 
MOVE A2,A 

POP AO 


; ASR15 Operation 
; Emulated in 5 Icyc, 


5 Instruction Words 


ASL A 

PUSH Al ; (PUSH is 
MOVE A2,A 

POP AO 


; ASR16 Operation 

; Emulated in 2 Icyc, 
MOVI 
MOV! 


Gl GI 


; ASR17 Operation 
; Emulated in 3 Icyc, 


ASR A 
MOVE Al1,A0 ; (Assumes ] 
MOVE A2,A1 


; ASR18 Operation 
; Emulated in 4 Icyc, 


ASR A 

ASR A 

MOVE Al1,A0 ; (Assumes ] 
MOVE A2,A1 


; ASR19 Operation 
; Emulated in 5 Icyc, 


ASR A 

ASR A 

ASR A 

MOVE Al1,A0 ; (Assumes ] 
MOVE A2,A1 


; ASR20 Operation 

; Emulated in 6 Icyc, 
ASR 
ASR 
ASR 
ASR 


> PPP 
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Al1,A0 7 
A2,Al 


2 Instruction Words 


3 Instruction Words 


4 Instruction Words 


5 Instruction Words 


6 Instruction Words 


(Assumes | 


a 2-word, 2 Icyc macro) 


a 2-word, 2 Icyc macro) 


a 2-word, 2 Icyc macro) 


contains 


contains 


contains 


contains 
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sign extension) 


sign extension) 


sign extension) 


sign extension) 
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MOV] 
MOV! 


Gl GI 


Al1,A0 ; 
A2,Al 


(Assumes 


8.2.5.2 Left Shifts (ASL16—ASL19) 


For arithmetic left shifts there is a faster way to shift an accumulator for large shift counts. The following 
code shows how to perform arithmetic left shifts of 16 through 19 bits on an accumulator. This emulation 
works without destroying any registers on the chip. If desired, it is possible to use this technique for bit 
shifts greater than 19, but it is not possible for shifts of 15 or fewer bits without losing information. 


; ASL16 Operation 
; Emulated in 4 Icyc, 


4 Instruction Words 


PUSH Al ; (PUSH is 
MOVE AO,A 
POP A2 


; ASL17 Operation 
; Emulated in 5 Icyc, 


5 Instruction Words 


ASL A 

PUSH Al ; (PUSH is 
MOVE AO,A 

POP A2 


; ASL18 Operation 
; Emulated in 6 Icyc, 


6 Instruction Words 


ASL A 

ASL A 

PUSH Al ; (PUSH is 
MOVE AO,A 

POP A2 


; ASL19 Operation 
; Emulated in 7 Icyc, 


7 Instruction Words 


ASL A 

ASL A 

ASL A 

PUSH Al ; (PUSH is 
MOVE AO,A 

POP A2 
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EXT contains sign extension) 


a 2-word, 2 Icyc macro) 


a 2-word, 2 Icyc macro) 


a 2-word, 2 Icyc macro) 


a 2-word, 2 Icyc macro) 
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8.3 Incrementing and Decrementing Operations 


Almost any piece of data can be incremented or decremented. This section summarizes the different 
increments and decrements available to both registers and memory locations. It is important to note the 
LEA instruction, which is used to increment or decrement AGU pointer registers. The TSTW instruction is 
also used for decrementing AGU pointer registers. This instruction is similar to LEA but also sets the 
condition codes, making it useful for program looping and other tasks. The LEA and TSTW instructions do 
not cause a pipeline dependency in the AGU (see Section 4.4, “Pipeline Dependencies,” on page 4-33). 
The TSTW instruction is not available for incrementing an AGU pointer or for decrementing the SP 
register. 


; Different ways to increment on the DSP56800 core 


INCW A ; on a Data ALU Accumulator 

INCW X0 ; on a Data ALU Input Register 

LEA (Rn) + ; on an AGU pointer register (RO-R3 or SP) 

INCW X:S0 ; on anywhere within the first 64 locations 
; of X data memory 

INCW X:$C200 ; on anywhere within the entire 64K locations 
; of X data memory 

INCW X: (SP-37) ; on a value located on the stack 


; Different ways to decrement on the DSP56800 core 


DECW A ; on a Data ALU Accumulator 

DECW X0 ; on a Data ALU Input Register 

LEA (Rn) — ; on an AGU pointer register (RO-R3 or SP) 

TSTW (Rn) — ; on an AGU pointer register (RO-R3) 

DECW X:$0 ; on anywhere within the first 64 locations 
; of X data memory 

DECW X:$C200 ; on anywhere within the entire 64K locations 
; of X data memory 

DECW X: (SP-37) ; on a value located on the stack 


The many different techniques available help to prevent registers from being destroyed. Otherwise, as 
found on other architectures, it is necessary to first move data to an accumulator to perform an increment. 


8.4 Division 


It is possible to perform fractional or integer division on the DSP56800 core. There are several questions to 
consider when implementing division on the DSP core: 


e Are both operands always guaranteed to be positive? 
e Are operands fractional or integer? 
¢ Is only the quotient needed, or is the remainder needed as well? 
e Will the calculated quotient fit in 16 bits in integer division? 
e Are the operands signed or unsigned? 
¢ How many bits of precision are in the dividend? 
¢ What about overflow in fractional and integer division? 
e Will there be “integer division” effects? 
NOTE: 


In a division equation, the “dividend” is the numerator, the “divisor” is the 
denominator, and the “quotient” is the result. 
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Once all these questions have been answered, it is possible to select the appropriate division algorithm. The 
fractional algorithms support a 32-bit signed dividend, and the integer algorithms support a 31-bit signed 
dividend. All algorithms support a 16-bit divisor. 


Note that the most general division algorithms are the fractional and integer algorithms for four-quadrant 
division that generate both a quotient and a remainder. These take the largest number of instruction cycles 
to complete and use the most registers. 


For extended precision division, where the number of quotient bits required is more than 16, the DIV 
instruction and routines presented in this section are no longer applicable. For further information on 
division algorithms, consult the following references (or others as required): 


Theory and Application of Digital Signal Processing, Lawrence R. Rabiner and Bernard Gold 
(Prentice-Hall: 1975), pages 524-530. 


Computer Architecture and Organization, John Hayes (McGraw-Hill: 1978), pages 190-199. 


8.4.1 Positive Dividend and Divisor with Remainder 


The algorithms in the following code are the fastest and take the least amount of program memory. In order 
to use these algorithms, it must be guaranteed that both the dividend and divisor are both positive, signed, 
two’s-complement numbers. One algorithm is presented for the division of fractional numbers and a 
second is presented for the division of integer numbers. Both algorithms generate the correct positive 
quotient and positive remainder. 


; Division of Fractional, Positive Data (B1:BO / XO) 


BFCLR #$0001,SR ; Clear carry bit: required for first DIV 
REP 16 

DIV X0,B ;Form positive quotient in BO 

ADD X0,B ;Restore remainder in Bl 


; (At this point, the positive quotient is 
; in BO and the positive remainder is in B1) 


; Division of Integer, Positive Data (B1:BO / XO) 


ASL B ;Shift of dividend required for integer 
; division 

BFCLR #$0001,SR ;Clear carry bit: required for first DIV 

REP 16 

DIV X0,B ;Form positive quotient in BO 

MOVE BO, Y1 ;Save quotient in Yl 


; (At this point, the positive quotient is in 
; BO but the remainder is not yet correct) 
ADD X0,B ;Restore remainder in Bl 
ASR B ;Required for correct integer remainder 

, 

x 


; (At this point, the correct positive 
; remainder is in B1) 
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8.4.2 Signed Dividend and Divisor with No Remainder 


The algorithms in the following code provide fast ways to divide two signed, two’s-complement numbers. 
These algorithms are faster because they generate the quotient only; they do not generate a correct 
remainder. The algorithms are referred to as four-quadrant division because they allow any combination of 
positive or negative operands for the dividend and divisor. One algorithm is presented for the division of 
fractional numbers, and a second is presented for the division of integer numbers. 


; 4 Quadrant Division of Fractional, Signed Data (B1:BO / XO) 
; Generates signed quotient only, no remainder 


; Setup 
MOVE B,Y1 ;Save Sign Bit of dividend (Bl) in MSB of Y1 
ABS B ;Force dividend positive 
EOR x0, Y1 ;Save sign bit of quotient in N bit of SR 
BFCLR #$0001,SR ;Clear carry bit: required for 1st DIV instr 
; Division 
REP 16 
DIV X0,B ;Form positive quotient in BO 
; Correct quotient 
BGE DONE ;If correct result is positive, then done 
NEG B ;Else negate to get correct negative result 
DONE 


; (At this point, the correctly signed 
; quotient is in BO but the remainder is not 
; correct) 


; 4 Quadrant Division of Integer, Signed Data (B1:BO / X0) 
; Generates signed quotient only, no remainder 


; Setup 
ASL B ;Shift of dividend required for integer 
; division 
MOVE B,Y1 ;Save Sign Bit of dividend (Bl) in MSB of Yl 
ABS B ;Force dividend positive 
EOR X0, Y1 ;Save sign bit of quotient in N bit of SR 
BFCLR #$0001,SR ;Clear carry bit: required for 1st DIV instr 
; Division 
REP 16 
DIV X0,B ;Form positive quotient in BO 
; Correct quotient 
BGE DONE ;If correct result is positive, then done 
NEG B ;Else negate to get correct negative result 
DONE 


; (At this point, the correctly signed 
; quotient is in BO but the remainder is not 
; correct) 
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8.4.3 Signed Dividend and Divisor with Remainder 


The algorithms in the following code are another way to divide two signed numbers, where both the 
dividend or the divisor are signed two’s-complement numbers (positive or negative). These algorithms are 
the most general because they generate both a correct quotient and a correct remainder. The algorithms are 
referred to as 4 quadrant division because these algorithms allow any combination of positive or negative 
operands for the dividend and divisor. One algorithm is presented for division of fractional numbers and a 
second is presented for the division of integer numbers. 
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QDON. 


DONE 


ig 


Four-Quadrant Division of Fractional, 


Division 


BO / X0) 


Signed Data (Bl: 


Generates signed quotient and remainder 


Setup 
MOV] 
MOV] 


Gl GI 


Correct quotient 
TFR 


NEG 


Gl 


He 


Four-Quadrant Division of Integer, 


B1,A 
B1,N 
B 
x0, Y1 

#S0001,SR 


16 
X0,B 


B,A 
QDONI 
B ; 


Gl 


AO, Y1 

X0,A 

A 

B,A 
#$8000,N, DON! 
#0,A0 

A 


Pl 


¥ 
, 
, 


;Save sign bit of dividend (B1) 
;Save sign bit of dividend (B1) 
;Force dividend positive 

;Save sign bit of quotient in N bit of SR 
;Clear carry bit: 


;If correct result is positive, 


in MSB of Al 
in MSB of N 


required for first DIV instruction 


then done 
Else negate to get correct negative result 


7Y1 <- True quotient 
;A <- Signed divisor 
;A <- Absolute value of divisor 


;Al <- Restored remainder 


; (At this point, the correctly signed 


quotient is in Yl and the correct 
remainder in Al) 


BO / XO) 


Signed Data (Bl: 


; Generates signed quotient and remainder 


, 


, 


QDON 


DONE 


Setup 
ASL 


He 


ABS 
EOR 
BFCLR 


;Division 


REP 
DIV 
Correct quotient 
TFR 


NEG 


Gl 


eae] 
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B ;Shift of dividend required for integer 
; division 
B1,A ;Save sign bit of dividend (B1) in MSB of Al 
B1,N ;Save sign bit of dividend (Bl) in MSB of N 
B ;Force dividend positive 
XO, Y1 ;Save sign bit of quotient in N bit of SR 
#S50001,SR ;Clear carry bit: required for first DIV instruction 
16 
X0,B 
B,A 
QDONE ;If correct result is positive, then done 
B ; Else negate to get correct negative result 
AO, Y1 ;Y1l <- True quotient 
X0,A ;A <- Signed divisor 
A ;A <- Absolute Value of divisor 
B,A ;Al <- Restored remainder 
#$8000,N, DONE 
#0,A0 
A 
B ;Shift required for correct integer remainder 


; (At this point, 


signed quotient in Yl, correct 


; remainder in Al) 
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8.4.4 Algorithm Examples 


This subsection provides examples of values calculated with the division algorithms in this section. 


Example 8-6. Simple Fractional Division 


A simple example of fractional division is the following case: 
0.125/0.5 = 0.25 


For this case a positive fractional algorithm can be selected. Converting the fractional numbers into hex gives the fol- 
lowing division: 


$10000000 / $4000 
This gives the following results: 


quotient = $2000 = 0.25 
remainder = 0 


Example 8-7. Signed Fractional Division 


Another example of fractional division is the following case: 
-0.2628712165169417858123779297 / 0.39035034179687500 = -0.6734008789062500 


For this case a four-quadrant fractional algorithm can be selected. Converting the fractional numbers into hex gives 
the following division: 


$de5a3c69 / $31f7 
This gives the following results: 
quotient = $a9ce = -0.6734008789062500 


Example 8-8. Simple Integer Division 


A simple example of integer division is the following case: 
64/9 =7 (remainder = 1) 
For this case a positive integer algorithm can be selected. Converting the integer numbers into hex gives the follow- 
ing division: 
$00000040 / $0009 
This gives the following results: 


quotient = $0007 = 7 
remainder = 1 


Example 8-9. Signed Integer Division 


Another example of integer division is the following case: 
-492789125 / -15896 = 31000 


For this case a four-quadrant integer algorithm can be selected. Converting the integer numbers into hex gives the 
following division: 


$e2a0a27b / $c1e8 
This gives the following results: 
quotient = $7918 = 31000 


The results can be easily checked by multiplying the quotient by the divisor and adding the remainder to 
this product. The final answer should be the same as the original dividend. 
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8.4.5 Overflow Cases 


Both integer and fractional division are subject to division overflow. Overflow is the case where the 
correct value of the quotient will not fit into the destination available to store it. 


For division of fractional numbers, the result must be a 16-bit, signed fractional value greater than or equal 
to -1.0 and less than 1.0 - 24-!! In other words, it must satisfy the following: 


-1.0 < quotient < +1.0 - 24N1 


For the case where the magnitude of the dividend is larger than the magnitude of the divisor, this inequality 
will not be true because any result generated will be larger in magnitude than 1.0. Thus, division overflow 
occurs with fractional numbers for the case where the absolute value of the divisor is less than or equal to 
the absolute value of the dividend: 


|divisor| < |dividend| 


If this condition can be true when dividing fractional numbers, it must be prevented from occurring by first 
scaling the dividend. 


For the division of integer numbers, the result must be a 16-bit, signed integer value greater than or equal 
to -2N- and less than or equal to [2'N-!1 1], where N is equal to 16. In other words: 


-2,IN-1] < quotient < [2!N-") -1], where N = 16 


When integer numbers are being divided, it must be guaranteed that the final result can fit into a signed, 
16-bit integer value. Otherwise, to prevent this from occurring, it is first necessary to scale the numerator. 


8.5 Multiple Value Pushes 


The DSP56800 core currently supports a one-word, one-instruction-cycle POP instruction for removing 
information from the stack. The PUSH operation, however, is a two-word, two-instruction-cycle macro, 
which expands to the following code. (This instruction macro works quite well when pushing a single 
variable.) 


; Expansion of the PUSH Instruction Macro 
; Emulated in 2 Icyc, 2 Instruction Words 
LEA (SP) + ; Increment the SP (1 Icyc, 1 Word) 
MOVE <register>,X: (SP) ; Place value onto the stack 
7 (1 Icyc, 1 Word) 


However, there is a better technique when it is necessary to push more than one value onto the software 
stack. Instead of using consecutive PUSH instruction macros, it is more efficient and saves more 
instruction words by expanding out the PUSH operation: 


; Faster technique for pushing multiple values onto the stack 
; Finishes in 5 Icyc, 5 Instruction Words 


LEA (SP) + ; Increment SP 

MOVE X0,X: (SP) + 

MOVE YO,X: (SP) + 

MOVE RO, X: (SP) + 

MOVE R1,X: (SP) ; No post-increment SP on last MOVE 


In this case five instruction cycles and five words are used to push four values onto the software stack. If 
the PUSH instruction macro had been used instead, it would have performed the same function in eight 
instruction cycles with eight words. 


@ vororo.a Software Techniques 8-19 


Software Techniques 


Another use of the PUSH instruction is for temporary storage. Sometimes a temporary variable is required, 
such as in swapping two registers. There are two techniques for doing this, the first using an unused 
register and the second using a location on the stack. The second technique uses the PUSH instruction 
macro and works whenever there are no other registers available. The two techniques are shown in the 
following code: 


; Swapping two registers (X0, RO) using an Available Register (N) 
; 3 Icyc, 3 Instruction Words 


MOVE xXO0,N ; XO -> TEMP 
MOVE  RO,X0O ; RO -> xO 
MOVE N,RO ; TEMP -> RO 


; Swapping two registers (X0, RO) using a Stack Location 
; 4 Icyc, 4 Instruction Words 


PUSH XO >; XO -> TEMP 
MOVE RO, XO ; RO -> xO 
POP RO ; TEMP -> RO 


The operation is faster using an unused register if one is available. Often, the N register is a good choice 
for temporary storage, as in the preceding example. 


8.6 Loops 


The DSP56800 core contains a powerful and flexible hardware DO loop mechanism. It allows for loop 
counts up to 8,192, it allows a large number of instructions (maximum of 64K) to reside within the body of 
the loop, and hardware DO loops can be interrupted. In addition, loops execute correctly from both on-chip 
and off-chip program memory, and it is possible to single step through the instructions in the loop using the 
OnCE port for emulation. 


The DSP56800 core also contains a useful hardware REP loop mechanism, which is very useful for very 
simple, fast looping on a single instruction. It is very useful for simple nesting when the inner loop only 
contains a single instruction. For a REP loop, the instruction to be repeated is only fetched once from 
program memory, reducing activity on the buses. This is very useful when executing code from off-chip 
program memory. However, REP loops are not interruptible. 


8.6.1 Large Loops (Count Greater Than 63) 


Currently, the DO instruction allows an immediate value up to the value 63 to be specified for the loop 
count. When necessary, specifying an immediate value larger than 63 is done using one of the registers on 
the DSP56800 core to specify the loop count. Since registers are a precious resoutce, it is desirable not to 
use any important registers that may contain valid data. The following code shows a technique for 
specifying loop counts greater than 63 without destroying any register values. 


MOVE #2048,LC ; Specify a loop count greater than 63 
; using the LC register 
DO LC, LABEL ; (LC register used to avoid destroying 
; another register) 
; (instructions) 
LABEL 


Since the LC register is already a dedicated register used for looping and is always loaded by the DO 
instruction, no information is lost when this register is used to specify a larger loop count. Note that this 
technique will also work with the LC register for nested loops, as long as the loading of the LC register 
with immediate data occurs after the LC register is pushed for nested loops. 
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NOTE: 


This technique should not be used for the REP instruction because it will 
destroy the value of the LC register if done by a REP instruction nested 
within a hardware DO loop. 


8.6.2 Variable Count Loops 


There are cases where it is useful to loop for a variable number of times instead of a constant number of 
times. For these cases the loop count is specified using a register. This allows a variable number of loop 
iterations from 1 to 2* times (where k is the number of bits in the LC register, or 13). It is important to 
consider what takes place if this variable is zero or negative. Whenever a DO loop is executed and the loop 
count is zero, the loop will execute 2!3 times. For the case where the number of iterations is negative, the 
number will simply be interpreted as an unsigned positive number and the loop will be entered. If there is a 
possibility that a register value may be less than or equal to zero, then it is necessary to insert extra code 
outside of the loop to detect this and branch over the loop. This is demonstrated in the following code. 


; Hardware looping when the loop count can be negative or zero 


TSTW XO ; Skip over loop if loop count <= 0 
BLE LABEL 

DO X0, LABEL 

ASL A 


LABEL 


For the case of REP looping on a register value when the register contains the value 0, the instruction to be 
repeated is simply skipped as desired; no extra code is required. This is also true when an immediate value 
of 0 is specified. For the case where the number of iterations can be negative, the response is the same as 
for the DO loop and can be solved using the preceding technique presented for DO looping. 


8.6.3 Software Loops 


The DSP56800 provides the capability for implementing loops in either hardware or software. For 
non-nested loops in critical code sections, the hardware looping mechanism is always the fastest. However, 
there is a limitation when the hardware looping mechanism is used. The DSP56800 allows a maximum of 
two nested hardware DO loops. Any looping beyond this generates a HWS overflow interrupt. 


Software looping techniques are also efficiently implemented on the DSP core. Software looping simply 
uses a register or memory location and decrements this value until it reaches zero. A branch instruction 
conditionally branches to the top of the loop. 


There are three different techniques for implementing a loop in software: one using a data ALU register, 
one using an AGU register, and one using a memory location to hold the loop count. Each of these is 
shown in the following code. 
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; Software Looping 


; Data ALU Register Used for Loop Count 

MOVE #3,X0 ; Load loop count to execute the loop three times 
LABEL ; Enters loop at least once 
; (instructions) 

DECW XO 

BGT LABEL ; Back to top-of-loop if positive and not 0 


; Software Looping 
; AGU Register Used for Loop Count 


MOVE #3-1,R0 ; Load loop count to execute the loop three times 
LABEL ; Enters loop at least once 
7 (instructions) 

TSTW (RO) - 

BGT LABEL ; Back to top-of-loop if positive and not 0 


; Software Looping 


; Memory Location (one of first 64 XRAM locations) Used for Loop Count 

MOVE #3,X:S7 ; Load loop count to execute the loop three times 
LABEL ; Enters loop at least once 
: (instructions) 

DECW X:S7 

BGT LABEL ; Back to top-of-loop if positive and not 0 


8.6.4 Nested Loops 


This section gives recommendations for and a detailed discussion of nested loops. 


8.6.4.1 Recommendations 


For nested looping it is recommended that the innermost loop be a hardware DO loop when appropriate 
and that all outer loops be implemented as software loops. Even though it is possible to nest hardware DO 
loops, it is better to implement all outer loops using software looping techniques for two reasons: 


1. The DSP56800 allows only two nested hardware DO loops. 


2. The execution time of an outer hardware loop is comparable to the execution time of a 
software loop. 


Likewise, there is little difference in code size between a software loop and an outer loop implemented 
using the hardware DO mechanism. 


The hardware nesting capability of DO loops should instead be used for efficient interrupt servicing. It is 
recommended that the main program and all subroutines use no nested hardware DO loops. It is also 
recommended that software looping be used whenever there is a JSR instruction within a loop and the 
called subroutine requires the hardware DO loop mechanism. If these two rules are followed, then it can be 
guaranteed that no more than one hardware DO loop is active at a time. If this is the case, then the second 
HWS location is always available to ISRs for faster interrupt processing. This significantly reduces the 
amount of code required to free up and restore the hardware looping resources such as the HWS when 
entering and exiting an ISR, since it is already known upon entering the ISR that a HWS location is 
available. 


If this technique is used, the ISRs should not themselves be interruptible, or, if they can be interrupted, 
then any ISR that can interrupt an ISR already in progress must save off one HWS location. See 
Section 8.12, “Freeing One Hardware Stack Location.” 


The following code shows the recommended nesting technique: 
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; Nesting Loops Recommended Technique 


MOVE #3,X:S0003 ; Set up loop count for outer loop 
7 (software loop) 


OUTER 
: (instructions) 
DO X0, INNER ; DO loop is inner loop (hardware loop) 
ASL A 
MOVE: A,X: (RO) + 
INNER 
; (instructions) 
DECW X:$0003 ; Decrement outer loop count 
BGT OUTER ; Branch to top of outer loop if not done 


It would also be possible to use a data ALU or AGU register if more speed is needed. 


An exception to the preceding recommendation for nesting loops is for the unique case where the 
innermost loop executes a single-word instruction. In this case it is possible to use a REP loop for the 
innermost loop and a hardware DO loop for the outermost loop. This is demonstrated in the following 
code: 


; Nesting Loops Recommended Technique for Special Case of REP Loop Nested 
; Within a Hardware DO Loop 


INCW A 
DO X0, LABEL ; DO loop is outer loop (interruptible) 
MOVE B,Y1 
7 (instructions) 
RE #4 ; REP loop is inner loop (non-interruptible) 
ASL A ; (Must be a one-word instruction) 
5 (instructions) 
MOVE A,X: (RO) + 


LABEL 


The REP loop may not be interrupted, however, so this technique may not be useful for large loop counts 
on the innermost loop if there are tight requirements for interrupt latency in an application. If this is the 
case, then the first example with a software outer loop and an inner DO loop may be appropriate. 


8.6.4.2 Nested Hardware DO and REP Loops 


Nesting of hardware DO loops is permitted on the DSP56800 architecture. However, it is not 
recommended that this technique be used for nesting loops within a program. Rather, it is recommended 
that the hardware nesting of DO loops be used to provide more efficient interrupt processing, as described 
in Section 8.6.4.1, “Recommendations.” 


Since the HWS is two locations deep, it is possible to nest one DO loop within another DO loop. 
Furthermore, since the REP instruction does not use the HWS, it is possible to place a REP instruction 
within these two nested DO loops. The following code shows the maximum nesting of hardware loops 
allowed on the DSP56800 processor: 
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; Hardware Nested Looping Example of the Maximum Depth Allowed 


1, 


DO #3, OLABEL ; Beginning of outer loop 
PUSH LC 
PUSH LA 
DO X0, ILABEL ; Beginning of inner loop 
: (instructions) 
REP YO ; Skips ASL if yO = 0 
ASL A 
; (instructions) 
TLABEL ; End of inner loop 
POP LA 
POP LC 
NOP ; three instructions required after POP 
NOP ; three instructions required after POP 
NOP ; three instructions required after POP 
OLABEL ; End of outer loop 


The HWS’s current depth can be determined by the NL and LF bits, as shown in Table 5-3, “Program 
RAM Operating Modes,” on page 5-11. From these bits it is possible to determine whether there are no 
loops currently in progress, a single loop, or two nested loops. Refer to Section 5.1.9.8, “Reserved OMR 
Bits—Bits 2, 7 and 9-14,” on page 5-13 for the values of these bits in these different conditions. 


For nested DO loops, it is required that there be at least three instructions after the POP of the LA and LC 
registers and before the label of any outer loop. This requirement shows up in the preceding example as 
three NOPs but can be fulfilled by any other instructions. 


Further hardware nesting is possible by saving the contents of the HWS and later restoring the stack on 
completion, as described in Section 8.13, “Multitasking and the Hardware Stack.” 


8.6.4.3 Comparison of Outer Looping Techniques 


A comparison of the execution overhead and extra code size of software and hardware outer loops shows 
that for loop nesting, it is just as efficient to nest in software (see Table 8-1). If a data ALU register or 
AGU register is available for use as the loop count, each loop executes one cycle faster than nesting loops 
in hardware. If there are no on-chip registers available for the loop counter, then the third technique can be 
used that uses one of the first 64 locations of X data memory. This technique executes one cycle slower per 
loop than nesting loops in hardware. Each of the software techniques also uses fewer instruction words. 


Table 8-1 Outer Loop Performance Comparison 


Additional 
i Number of Icyc | Number of Icyc rote Rumber o 
Loop Technique Instruction 
to Set Up Loop Executed 
Words 
Each Loop 
Hardware nested DO loops 3 5 7 
Software using data ALU register 1 4 3 
Software using AGU register 1 4 3 
Software using memory location 2 6 4 


It is recommended that the nesting of hardware DO loops not be used for implementing nested loops. 
Instead, it is recommended that all outer loops in a nested looping scheme be implemented using software 
looping techniques. Likewise, it is recommended that software looping techniques be used when a loop 
contains a JSR and the called routine contains many instructions or contains a hardware DO loop. 
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8.6.5 Hardware DO Looping in Interrupt Service Routines 


Upon entering an ISR, it is possible that one or two hardware DO loops are currently in progress. This 
means that the hardware looping resources (the LA and LC registers and the HWS) are currently in use and 
may need to be freed up if hardware looping is required within the ISR. 


If the recommendations presented in Section 8.6.4, “Nested Loops,” are followed, then it may be possible 
to guarantee that a maximum of one DO loop is active. In this case the HWS is guaranteed to have at least 
one open location, and the LF and NL bits will correctly indicate the looping status. In this case an ISR 
simply pushes the LA and LC registers upon entering the routine and pops them upon exit. This is very 
efficient, as demonstrated in the following code: 

; Example of an ISR That Uses the Hardware DO Looping Mechanism 


; Assumes that at least one HWS location is free 
; Overhead is 5 instruction cycles, 5 instruction words 


ISR 
LEA (SP) + ; Save Hardware Looping Resources 
MOVE LC, X: (SP) + 
MOVE LA, X: (SP) 

; (instructions) 
DO #7, LABEL ; Example of a DO loop within an ISR 
INC A 

LABEL 

7 (instructions) 
POP LA ; Restore Hardware Looping Resources 
POP LC 
RTI 


Note that this five-cycle, five-word overhead is not required if the hardware DO loop is not required by the 
interrupt service routine. Also note that this overhead is not required if only the hardware REP loop is used 
by the ISR. 


If this technique is used, it is important that any ISR that uses hardware DO looping cannot be interrupted 
by a maskable interrupt and that any non-maskable ISRs save one location of the HWS if they require 
hardware looping. 


For ISRs where it is possible that there are two DO loops currently in progress upon entering the routine, it 
is necessary to free up one HWS location as well. This is accomplished using the technique described in 
Section 8.12, “Freeing One Hardware Stack Location.” 


8.6.6 Early Termination of aDO Loop 


There are two techniques that can be used to terminate a DO loop early. In the first technique the loop is 
terminated such that it continues executing the remainder of the instructions in the loop but will not return 
to the top of the loop. In this case it is best to use the following instruction instead of ENDDO: 


MOVE #1,LC. 


This way, the HWS will purge its value at the correct time, as if there is a nesting of hardware DO loops; 
the LC and LA registers will be popped correctly in software. 


There is also the case where it is desirable to conditionally break out of the loop immediately without 
executing any more instructions in the loop. In this case it is recommended to use the technique shown in 
the following code: 
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PUSH LC ; Save outer loop registers if nested loop 
PUSH LA 
DO #N, LABEL 
(instructions in loop) 
Bec EXITLP ; 2 Icyc for each iteration 
; 3 Icyc if loop terminates when true 
5 (instructions) 
LABEL 
BRA OVER 3 additional Icyc for BRA when exiting loop 


if normal exit 
1 additional Icyc for ENDDO when exiting 
loop if exit via Bcc 


EXITLP ENDDO 


Ne Ne Ne Ne 


OVER 
POP LA ; Restore outer loop registers if nested loop 
POP LC 
, 
He 8 vRSRSSe or with another technique —----—— 
, 
PUSH LC ; Save outer loop registers if nested loop 
PUSH LA 
DO #N, LABEL 
; (instructions) 
Bec OVER ; 3 Icyc for each iteration 
ENDDO ; 6 Icyc if loop terminates when true 
BRA LABEL 
OVER 
(instructions) 
LABEL 
POP LA ; Restore outer loop registers if nested loop 
POP LC 


8.7 Array Indexes 


The flexible set of addressing modes on the DSP56800 architecture allow for several different ways to 
index into arrays. Array indexing usually involves a base address and an offset from this base. The base 
address is the address of the first location in the array, and the offset indicates the location of the data in the 
array. For example, the first value in the array typically has an offset of 0, whereas the fourth element has 
an offset of 3. The n" element is always accessed with an offset of n - 1. 


There are two types of arrays typically implemented: global arrays (whose base address is fixed and known 
at assembly time) and local arrays (whose base address may vary as the program is running). Global arrays 
that are small in size can benefit from the single-word instruction that directly accesses the first 128 
locations of the X data memory, as well as the indexed with short displacement addressing mode. 


8.7.1 Global or Fixed Array with a Constant 


This type of array indexing is performed with the X:#xxxx or X:<aa> addressing mode, where the 
assembler adds the base address to the constant offset into the array. Arrays that are small in size can be 
indexed using the X:<aa> addressing mode, saving one program word and one instruction cycle. It is also 
possible to use the X:(Rn+xxxx) or X:(R2+xx) addressing modes if the base address of the array is stored 
in a Rn register. 
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8.7.2 Global or Fixed Array with a Variable 


This type of array indexing is performed with the X:(Rn+xxxx), X:(R2+xx), or X:(Rn+N) addressing 
mode. 


In the first two addressing modes—X:(Rn+xxxx) and X:(R2+xx)—the constant value specifies the base 
address of the array, and Rn or R2 specifies the offset into the array. These first two are similar to the 
method used by microcontrollers and are useful when only one or two accesses are performed with a 
particular base address, because it is not necessary to load a register with the base address. The X:(R2+xx) 
addressing mode executes in one fewer instruction cycle and uses one fewer instruction word than the 
X:(Rn+xxxx) addressing mode. It is useful for arrays whose base address is located in the first few 
locations in X data memory. 


In the last addressing mode—X:(Rn+N)—Ran is the base address of the array, and N specifies the offset. 
This addressing mode is best for the case where many accesses are to be performed into an array. In this 
case the base address is only loaded once into the Rn register and then many accesses can be performed 
using the X:(Rn+N) addressing mode. This addressing mode uses a single program word and executes in 
two instruction cycles. 


8.7.3 Local Array with a Constant 


This type of array indexing is done with the X:(Rn+xxxx) or X:(R2+xx) addressing mode, where Rn holds 
the base address of the array and the constant value specifies the constant offset into the array. (It can also 
be done with the X:(SP+#xxxx) or X:(SP-#xx) addressing mode, but this is not as straightforward.) In this 
case SP holds the address of the end of the stack frame, and the base address of the array is located using a 
constant offset value from the stack pointer. The constant used to index into this local array is added to the 
offset of the base address from the stack pointer to access the desired location of an array stored within the 
stack frame. Stack frames are discussed in Section 8.8, “Parameters and Local Variables.” 


8.7.4 Local Array with a Variable 


This type of array indexing is done with the X:(Rn+N) or X:(SP+N) addressing mode. It is similar to the 
technique described in Section 8.7.3, “Local Array with a Constant,” but, instead of using a constant index, 
the register N holds the variable offset into the array. For the case of X:(SP+N), the N register contains the 
sum of the index into the array and the offset of the array’s base address from the stack pointer. 


8.7.5 Array with an Incrementing Pointer 


Often it is desired to sequentially access the elements in an array. This type of array indexing is most often 
done with the X:(Rn)+ addressing mode, where Rn is initialized to the first element of the array of interest 
and sequentially advances to each next element in the array by the automatic post-incrementing address 
mode. In special cases it is also possible to use X:(Rn+N), where N holds the base address and Rn is the 
incrementing array index that is advanced using an LEA (Rn)+ instruction. The latter is useful where it is 
also necessary to have access to the variable that holds the index into the array, which is held in the Rn 
register. 
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8.8 Parameters and Local Variables 


The DSP56800 software stack supports structured programming techniques, such as parameter passing to 
subroutines and local variables. These techniques can be used for both assembly language programming 
and high-level language compilers. 


Parameters can be passed to a subroutine by placing these variables on the software stack immediately 
before performing a JSR to the subroutine. Placing these variables on the stack is referred to as building a 
“stack frame.” These passed parameters are then accessed in the called subroutines using the stack 
addressing modes available on the DSP56800. This is demonstrated in the following example (which 
destroys the x0 register): 


; Example of Subroutine Call With Passed Parameters 


MOVE X:$35,X0 ; Pointer variable to be passed to subroutine 
LEA (SP) + ; Push variables onto stack 
MOVE X0,X: (SP) + 
MOVE X:$21,X0 ; First data variable to be passed to subroutine 
MOVE X0,X:(SP)+ ; Push onto stack 
MOVE X:$47,X0 ; Second data variable to be passed to 
; subroutine 
MOVE XO,X: (SP) ; Push onto stack 
JSR ROUTINE1 
POP ; Remove the three passed parameters from 
; stack when done 
POP 
POP 
ROUTINE1 
MOVE: #5,N ; Allocate room for local variables 
LEA (SP) +N 
; (instructions) 
MOVE X:(SP-9),r0 ; Get pointer variable 
MOVE xX:(SP-7),B ; Get second data variable 
MOVE X:(RO),X0 ; Get data pointed to by pointer variable 
ADD X0,B 
MOVE B,X: (SP-8) ; Store sum in first data variable 
; (instructions) 
MOVE #-5,N 
LEA (SP) +N 
RTS 


In a similar manner it is also possible to allocate space and to access variables that are locally used by a 
subroutine, referred to as local variables. This is done by reserving stack locations above the location that 
stores the return address stacked by the JSR instruction. These locations are then accessed using the 
DSP56800’s stack addressing modes. For the case of local variables, the value of the stack pointer is 
updated to accommodate the local variables. For example, if five local variables are to be allocated, then 
the stack pointer is increased by the value of five to allocate space on the stack for these local variables. 
When large numbers of variables are allocated on the stack, it is often more efficient to use the (SP)+N 
addressing mode. 


It is possible to support passed parameters and local variables for a subroutine at the same time. In this case 
the program first pushes all passed parameters onto the stack (see Figure 8-1) using the technique outlined 
in Section 8.5, “Multiple Value Pushes.” Then the JSR instruction is executed, which pushes the return 
address and the SR onto the stack. Upon being entered, the subroutine first allocates space for local 
variables by updating the SP. Then, both passed parameters and local variables can be accessed with the 
stack addressing modes. 
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X Data Memory 


SP Fifth Local Variable 
Fourth Local Variable 
Third Local Variable 

Second Local Variable 

First Local Variable 
Status Register 
Return Address 


Third Passed Parameter 


Second Passed Parameter 
First Passed Parameter 


AA0092 


Di 


Figure 8-1. Example of a DSP56800 Stack Frame 


8.9 Time-Critical DO Loops 


Often, a program spends most of its time in time-critical loops. For the efficient execution of these loops, it 
is important to have an adequate number of registers. However, sometimes the registers already contain 
data that is not necessary for the critical loop but must not be lost. In this case the DSP56800 architecture 
provides a convenient mechanism for freeing up these registers using the software stack. The programmer 
pushes any registers containing values not required in the tight loop, freeing up these registers for use. 
After completion of the loop, these registers are popped. An example is shown in the following code. 
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MOV! 


Gl 


#$1234,R3 Contents of this register not 
required in tight loop 
Contents of this register not 


required in tight loop 


MOVE #S$5aa,A 


Ne Ne Ne Ne 


PUSH R3 ; Prepare for tight loop: X0, YO are 
; unused and available, and RO already 
; points to that required for loop 


PUSH AO 
PUSH Al 
PUSH A2 


; Enter Section with Tight Loop —- R3 and A can now be used by tight loop 
MOVE $C000,R3 


CLR A 

MOVE X: (RO) +, YO X: (R3)+,X0 

REP #32 

MAC X0,Y0,A X: (RO) +, YO X: (R3)+,X0 

MOVE A,X: (R2)+ ; store result 

POP A2 ; tight loop completed, restore 
; borrowed registers 

POP Al 

POP AO 

POP R3 


In the preceding example there are four PUSH instruction macros in a row. For more efficient and compact 
code, use the technique outlined in Section 8.5, “Multiple Value Pushes.” In certain cases it may also be 
possible to store critical information within the first 64 locations of X data memory, on the top of the stack, 
or in an unused register such as N when an extra location is required within a tight loop itself. 


8.10 Interrupts 


The interrupt mechanism on the DSP56800 is simple, yet flexible. There are two levels of interrupts: 
maskable and non-maskable. All maskable interrupts on the chip can be masked at one spot in the SR. 
Likewise, individual peripherals can be individually masked within one register, within the interrupt 
priority register (IPR), or at the peripheral itself. It is beneficial to have a single register in which all 
maskable interrupts can be individually masked. This gives the user the capability to set up interrupt 
priorities within software. 


When programming interrupts, it is necessary to correctly set up the following tasks: 
1. Initialize and program the peripheral, enabling interrupts within the peripheral. 
2. Program the IPR to enable interrupts on that particular interrupt channel. 


3. Enable interrupts in the SR. 


8.10.1 Setting Interrupt Priorities in Software 


This section demonstrates several different styles of coding possible for ISRs on the DSP56800 core. In 
counting the number of overhead instruction cycles, it is important to remember that the JSR instruction 
executes in four instruction cycles when entering an interrupt, and that the RTI instruction now takes five 
instruction cycles to complete. 
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8.10.1.1 High Priority or a Small Number of Instructions 


During ISRs that are short, it is recommended that level 0 interrupts remain disabled. Since the routines are 
short, it is not nearly so important to interrupt them, because they are guaranteed to complete execution 


quickly. This is also recommended for ISRs with a very high priority, which should not be interrupted by 
some other source. 


; Interrupt Service Routine 

; DSP56800 core (Interrupts Remain Masked, 9 Overhead Cycles) 
JSR ISR ; located in interrupt vector table 

ISR 7 Long ISR 

- (interrupt code) 
RTI 


8.10.1.2 Many Instructions of Equal Priority 


For ISRs that require a significant number of instruction cycles to complete, it is possible to reduce the 
interrupt servicing overhead if all interrupts can be considered to have the same priority. This is shown in 
the following generic ISR. 


; Interrupt Service Routine for Long Interrupt 
; DSP56800 core (Interrupts Remain Masked, 11 Overhead Cycles) 
JSR ISR ; located in interrupt vector table 


ISR ; Long ISR 

BFCLR #S$0200,SR; re-enable interrupts with new mask 
7 (interrupt code) 

RTI 
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8.10.1.3 Many Instructions and Programmable Priorities 


For ISRs that require a significant number of instruction cycles to complete, it is possible for the user to 
still program interrupt priorities in software. This is shown in the following generic ISR. 


; Generic ISR -— DSP56800 core (20 Overhead Cycles) 


JSR ISR ; Instr located in Interrupt Vector Table 
; (instructions) 
ISR ; ISR 

LEA (SP) + 

MOVE N,X: (SP)+ ; Save “N” register for usage by ISR 

MOVE X:IPR,N ; Save interrupted task’s IPR 

MOVE N,X: (SP) 

MOVE #xxxx,X:IPR ; Load new mask — defines which can interrupt 

; this ISR 

BFCLR #$0200,SR ; Re-enable interrupts with new mask 
; (interrupt code) 

POP N ; Restore interrupted task’s IPR 

MOVE N,X:IPR 

POP N ; Restore saved register used by ISR 

RTI 


8.10.2 Hardware Looping in Interrupt Routines 


Since an interrupt can occur at any location in a program, it is possible that the HWS used by hardware DO 
loops may already be full. If an ISR needs to use the DO looping mechanism, it may be necessary to free 
up one location in the HWS. This can be done using the technique outline in Section 8.12, “Freeing One 
Hardware Stack Location.” Alternatively, if it can be guaranteed that the main program will never use 
more than one DO loop at a time (that is, no nested loops), it may then be possible for an ISR to simply use 
hardware DO loops without using this technique to free up a stack location. 


8.10.3 Identifying System Calls by a Number 


In operating systems, system calls are often made by using an SWI instruction when a user’s task needs 
assistance from the operating system. Usually, it is useful to have several different types of system calls, 
each identified with a number. The following code shows how system calls can have an associated number 
when an SWI instruction is executed. 


MOVE #xx,N ; Put number associated with system call in N reg 
PUSH N ; Push this value on the stack so accessible by O/S 
SWI ; Generate interrupt to return to O/S 
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8.11 Jumps and JSRs Using a Register Value 


Sometimes it is necessary to perform a jump or a jump to subroutine using the value stored in an on-chip 
register instead of using an absolute address. The RTS instruction is used to perform this task because it 
takes the value on the software stack and loads it into the program counter, effectively performing a jump. 
The register used for the jump can be any register on the DSP core. 


; JMP <register> Operation 


; 8 Icyc 
LEA (SP) + 
;Note: Can use any core register 
MOVE <register>,X:(SP)+ 
MOVE  SR,X: (SP) 
RTS 


; Jcc <register> Operation 

; 10 Icyce (3 Icyc if condition false) 

Bcc~ OVER ; (cc~ is the condition exactly opposite the 
; desired cc) 


LEA (SP) + 

MOVE <register>,X:(SP)+ 
MOVE SR, X: (SP) 

RTS 


OVER 


; JSR <register> Operation - destroys one register, N 
; 11 Icyc 


MOVE #NEXT,N 
LEA (SP) + 
MOVE N,X:(SP)+ ; Push return address onto stack 
MOVE: SR, X: (SP) ; Push SR onto stack 
MOVE <register>,X:(SP)+ 
; Push address of subroutine onto stack 
MOVE SR, X: (SP) ; Push SR onto stack 
RTS ; Go to address in top two values on stack 


@ vororoia Software Techniques 8-33 


Software Techniques 


8.12 Freeing One Hardware Stack Location 


There are certain cases where a section of code should use DO looping, but it is not clear whether the HWS 
is full or not. An example is an ISR, which may be called when two nested DO loops are in progress. In 
these cases it may be desirable to free a single location on the HWS for use by a section of code such as an 
ISR. The following code shows how to free one location for an ISR: 


; Interrupt Service Routine - Frees Up One HWS Location 
; 14 extra Icyc, 12 extra words 


, 
ISR 
LEA (SP) + ; Push four registers onto the stack 
MOVE LA,X:(SP)+ ; Save LA register in case already in loop 
MOVE SR,X: (SP)+ ; Save LF bit in SR register... 
MOVE LC,X:(SP)+ ; Save LC register... 
MOVE HWS, xX: (SP) ; Save HWS register... 
(instructions) 
DO #3, LABEL 
INCW A 
LABEL 
. (instructions) 
POP LA ; Conditionally restore HWS 
BRCLR #$8000,X: (SP-1),_OVER 
MOVE LA, HWS 
_ OVER 
POP LC ; Restore LC register from stack 
POP ; Toss SR register from stack 
POP LA ; Restore LA register from stack 
RTI 


For ISRs that are maskable, it is better to follow the recommendations outlined in Section 8.6.4, “Nested 
Loops,” to reduce the overhead needed for freeing up one HWS location. This greatly simplifies the setup 
code required when entering and exiting the ISR. 


8.13 Multitasking and the Hardware Stack 


For multitasking, it is important to be able to save and later restore the hardware DO loop stack (HWS). 
This section shows code that will perform the save and restore operations. When reading the HWS, two 
locations of the stack are read as well as the current state of the HWS, contained in the NL and LF bits of 
the OMR and SR, respectively. Each read of the HWS register pops the HWS one value, and each write of 
the HWS register pushes the HWS one value. 
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8.13.1 Saving the Hardware Stack 


An example of reading the entire contents of the HWS to X memory is shown in the following code: 


; Save HWS 
7 4 Icyc, 4 words 
MOVE  SR,X: (R2)+ ; Read HWS pointer’s LSB (LF) and 


save to memory 
Read first stack location and 
save in X memory 
Read HWS pointer’s MSB (NL) and 
save to memory 

Read second stack location and 
save in X memory 


MOVE HWS,X: (R2)+ 


MOVE SR,X: (R2)+ 


MOVE HWS,X: (R2)+ 


Ne Ne Ne Ne Ne Ne Ne NS 


8.13.2 Restoring the Hardware Stack 


When restoring the HWS, it is first necessary that the HWS be empty. If this is unclear, performing two 
reads from the HWS will ensure that the stack is empty. Once this is true, then the HWS can be restored. 
An example of restoring the contents of the HWS from X data memory follows: 


; Restore HWS, 10 words, 14 Icyc worst case 
; Assumes R2 points to “stored” HWS 
; Destroys R2 register 


MOVE HWS,LA ; First read of HWS ensures NL bit is cleared 
MOVE HWS,LA ; Second read of HWS ensures LF bit is cleared 
BRCLR #$8000,X: (R2),OVER 

; If LF bit set, then push a value onto HWS 


LEA (R2) + 

MOVE X:(R2)+,HWS ; Puts one value onto stack and sets LF bit 
BRCLR #$8000,X: (R2),OVER 

; If NL bit set, then push a value onto HWS 


LEA (R2) + 
MOVE X:(R2)+,HWS 


OVER 
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Chapter 9 
JTAG and On-Chip Emulation (OnCE™) 


The DSP56800 family includes extensive integrated test and debug support. Two modules, the On-Chip 
Emulation (OnCE) module and the test access port (TAP, commonly called the JTAG port) provide board- 
and chip-level testing and software debugging capability. Both are accessed through a common 
JTAG/OnCE interface. Using these modules allows the user to insert the DSP chip into a target system 
while retaining debug control. This capability is especially important for devices without an external bus, 
since it eliminates the need for a costly cable to bring out the footprint of the chip, as required by a 
traditional emulator system. 


The OnCE port is a Motorola-designed module used to debug application software used with the chip. The 
port is a separate on-chip block that allows non-intrusive interaction with the DSP and is accessible 
through the pins of the JTAG interface. The OnCE port makes it possible to examine contents of registers, 
memory, or on-chip peripherals in a special debug environment. No user-accessible resources need be 
sacrificed to perform debugging operations. 


The JTAG port conforms to the IEEE Standard Test Access Port and Boundary-Scan Architecture 
specification IEEE 1149.1a-1993) as defined by the Joint Test Action Group (JTAG). The JTAG module 
uses a boundary scan technique to test the interconnections between integrated circuits after they are 
assembled onto a printed circuit board. Using a boundary scan allows a tester to observe and control signal 
levels at each component pin through a special register coupled to each pin, called a boundary scan cell. 
This is important for testing continuity and determining if pins are stuck at a one or zero level. 


This chapter presents an overview of the capabilities of the JTAG and OnCE modules. Since their 
operation is highly dependent upon the architecture of a specific DSP56800 device, the exact 
implementation is necessarily device dependant. For more complete information on interfacing, the debug 
and test commands available, and other implementation details, consult the appropriate device’s user’s 
manual. 


9.1 Combined JTAG and OnCE Interface 


The JTAG and OnCE modules are tightly coupled. The JTAG port provides the interface for both modules 
and handles communications with host development and test systems. Figure 9-1 on page 9-2 shows a 
block diagram of the JTAG/OnCE modules and external host interface. 
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Figure 9-1. JTAG/OnCE Interface Block Diagram 
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As already noted, the JTAG module is the master. It enables interaction with the debug services provided 
by the OnCE, and its external serial interface is used by the OnCE port for sending and receiving 
debugging commands and data. 


9.2 JTAG Port 


Problems associated with testing high-density circuit boards have led to the development of a proposed 
standard under the sponsorship of the Test Technology Committee of IEEE and the Joint Test Action 
Group (JTAG). The resulting standard, called the IEEE Standard Test Access Port and Boundary-Scan 
Architecture, specifies industry-standard, in-circuit device testing and diagnosis. The DSP56800 family 
provides a dedicated test access port (TAP) that is fully compatible with this standard, commonly referred 
to as the “JTAG port.” 


This section provides an overview of the capabilities of the JTAG port as implemented on the DSP56800. 
Information provided here is intended to supplement the supporting IEEE 1149.1a-1993 document, which 
outlines the internal details, applications, and overall methodology of the standard. Specific details on the 
implementation of the JTAG port for a given DSP56800-based device are provided in that device’s user’s 


manual. 
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9.2.1 JTAG Capabilities 


The DSP56800 JTAG port has the following capabilities: 


Performing boundary scan operations to test circuit-board electrical continuity 


Sampling the DSP56800-based device system pins during operation and transparently shifting out 
the result in the boundary scan register; preloading values to output pins prior to performing a 
boundary scan operation 


Querying identification information (manufacturer, part number, and version) from a 
DSP56800-based device 


Adding a weak pull-up device on all input signals to cause all open inputs to report a logic 1 and to 
force a predictable internal state while performing external boundary scan operations 


Disabling the output drive to pins during circuit-board testing 

Forcing test data onto the outputs of a DSP56800-based device 

Providing a means of accessing the OnCE controller and circuits to control a target system 
Providing a means of entering the debug mode of operation 


Bypassing the DSP56800 core for a given circuit-board test by effectively reducing the boundary 
scan register to a single cell 


Section 9.2.2, “JTAG Port Architecture,” provides an overview of the port’s architecture and commands. 
For additional information on the JTAG port’s implementation and command set, see the appropriate 
DSP56800-based device’s user’s manual. 


9.2.2 JTAG Port Architecture 


The JTAG module consists of the logic necessary to support boundary scan testing as defined in the IEEE 
specification. Although tightly coupled to the DSP56800’s core logic, it is an independent module, and, 
when disabled, it is guaranteed to have no impact on the function of the core. 


The JTAG port consists of the following components: 


Serial communications interface 
Command decoder and interpreter 
Boundary scan register 


ID register 


These units, and the overall once port architecture, are shown in Figure 9-2 on page 9-4. 
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Figure 9-2. JTAG Block Diagram 


The serial interface supports communications with the host development or test system. It is implemented 
as a Serial interface to occupy as few external pins on the device as possible. Consult the device’s user’s 
manual for a full description of the interface signals. All JTAG and OnCE commands and data are sent 
over this interface from the host system. The JTAG interface is also used by the OnCE port when it is 
active. In this mode, the JTAG acts as the OnCE port’s interface controller, and transparently passes all 
communications through to the OnCE port. 


Commands sent to the JTAG module are decoded and processed by the command decoder. Commands for 
the JTAG port are completely independent from the DSP56800 instruction set, and are executed in parallel 
by the JTAG logic. 


Registers in the JTAG module hold chip identification information and the information gathered by 
boundary scan operations. The ID register contains the industry-standard Motorola identification 
information, which is unique for each Motorola DSP. The boundary scan register holds a snapshot of the 
device’s pins when sampled by the JTAG port. 


9.3 OnCE Port 


The OnCE port provides emulation and debug capability directly on the chip, eliminating the need for 
expensive and complicated stand-alone in-circuit emulators (ICEs). The OnCE port permits full-speed, 
non-intrusive emulation on a user’s target system. This section describes the OnCE emulation environment 
for use in debugging real-time embedded applications. 


The OnCE port has an associated interrupt vector in the DSP56800 interrupt vector table. The OnCE 
exception trap is available to the user so that when a debug event (breakpoint or trace occurrence) is 
detected, a level 1 non-maskable interrupt can be generated and the program can initiate the appropriate 
handler routine. 
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As emulation capabilities are necessarily tied to the particular implementation of a DSP56800-based 
device, the appropriate device’s user’s manual should be consulted for complete details on implementation 
and supported functions. 


9.3.1 OnCE Port Capabilities 


The capabilities of the OnCE port include the following: 
¢ Interrupting and breaking into debug mode on a program memory address 
e Interrupting and breaking into debug mode on a data memory address (read, write, or access) 
¢ Interrupting and breaking into debug mode on an on-chip peripheral register access 
e Entering debug mode using a microprocessor instruction 
e« Examining or modifying the contents of any core or memory-mapped peripheral register 
e Examining or modifying any desired sections of program or data memory 
¢ Full-speed stepping on one or more instructions (up to 256) 
e Tracing one or more instructions 
e Saving or restoring the current state of the chip’s pipeline 
¢ Displaying the contents of the real-time instruction trace buffer 
e Returning to user mode from debug mode 


Depending on the implementation for a particular DSP56800-based device, additional debugging and 
emulation capabilities may be provided. Consult the user’s manual for the device in question for more 
information. 


9.3.2 OnCE Port Architecture 
The OnCE port module is composed of four different sub-modules, each of which performs a different 
task: 

¢ Command, status, and control 

¢  Breakpoint and trace 

¢ Pipeline save and restore 

e FIFO history buffer 


These units, and the overall once port architecture, are shown in Figure 9-3 on page 9-6. 
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Together, these sub-modules provide a full-featured emulation and debug environment. Communication 
with the OnCE port module is handled via the JTAG port and thus may be considered the primary 
communications sub-module for the OnCE port, although it operates independently. The operations of the 
OnCE port occur independently of the main DSP56800 core logic, and require no core resources. 
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9.3.2.1 Command, Status, and Control 


The command, status and control portion of the OnCE port module handles the processing of emulation 
and debugging commands from a host development system. Communications with a host system are 
provided by the JTAG port module, and are passed transparently through to this logic, which is responsible 
for coordinating all emulation and debugging activity. 


As previously noted, all emulation and debug processing takes place independently of the main DSP56800 
processor core. This allows for instructions to be executed in debug mode at full speed, without any 
overhead introduced by the debugging logic. 


9.3.2.2 Breakpoint and Trace 


The OnCE port module includes address-comparison hardware for setting breakpoints on program or data 
memory accesses. This allows breakpoints to be set on program ROM as well as program RAM locations. 
Breakpoints can be programmed for reads, writes, program fetches, or memory accesses. Breakpoints are 
also possible during on-chip peripheral register accesses, since these are implemented as memory-mapped 
registers in the X data space. 


Full-speed instruction stepping capability is also provided. Up to 256 instructions can be executed at full 
speed before the processor core is halted and the debug processing state is re-entered. This allows the user 
to single step through a program or execute whole functions at a time. 


9.3.2.3 Pipeline Save and Restore 


To resume normal chip activity when the chip is returning from the debug mode, the previous chip pipeline 
state must be reconstructed. The OnCE port module provides logic to correctly save and restore the 
pipeline state when entering and exiting debug mode. Pipeline saves and restores operate transparently to 
the user, although the pipeline state may be examined while in debug mode if desired. 


9.3.2.4 FIFO History Buffer 


To ease debugging activity and to help keep track of program flow, a read-only FIFO buffer is provided 
that tracks the execution history of an application. It stores the address of the instruction currently being 
executed by the processor core, as well as the addresses of the last five execution flow instructions. 


The FIFO history buffer is intended to provide a snapshot of the recent execution history of the processor 
core. To give a larger picture of instruction flow, not all instructions are recorded in the buffer. Only the 
addresses of the following execution flow instructions are stored: 


BRA JMP 
JSR Bcc (with condition true) 


Jcc (with condition true) 


Sequential program flow can be assumed between recorded instructions, so it is possible for the user to 
reconstruct the program flow extending back through quite a large number of instructions. To complete the 
execution history, the first location of the FIFO always holds the address of the last executed instruction, 
regardless of whether or not it caused a change of program flow. 
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Appendix A 
Instruction Set Details 


This appendix contains detailed information about each instruction of the DSP56800 instruction set. It 
contains sections on notation, addressing modes, and condition codes. Also included is a section on 
instruction timing, which shows the number of program words and execution time of each instruction. 
Finally, the instruction set summary, which shows the syntax of all allowed DSP56800 instructions, is 
presented. 


A.1 Notation 


Each instruction description contains notation used to abbreviate certain operands and operations. The 
symbols and their respective descriptions are listed in Table A-1 through Table A-7 on page A-4. 


Table A-1 shows the register set available for the most important move instructions. Sometimes the 
register field is broken into two different fields—one where the register is used as a source and the other 
where it is used as a destination. This is important because a different notation is used when an 
accumulator is being stored without saturation. In addition, see the register fields in Table A-2 on 

page A-2, which are also used in move instructions as sources and destinations within the AGU. 


Table A-1. Register Fields for General-Purpose Writes and Reads 


Register Field | Registers in This Field Comments 


HHH A, B, Ai, B1 Seven data ALU registers — two accumulators, two 16-bit MSP por- 
X0, YO, Y1 tions of the accumulators and three 16-bit data registers 


HHHH A, B, Ai, B1 Seven data ALU and five AGU registers 
XO, YO, Y1 
RO-R3, N 


DDDDD A, A2, Ai, AO All CPU registers 
B, B2, B1, BO 


Y1, YO, XO 


RO, R1, R2, R3 
N, SP 
MOo1 


OMR, SR 
LA, LC 
HWS 
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Table A-2 shows the register set available for use as pointers in address-register-indirect addressing 
modes. The most common fields used in this table are Rn and Rj. This table also shows the notation used 
for AGU registers in AGU arithmetic operations. 


Table A-2. Address Generation Unit (AGU) Registers 


Register Field 


Registers in This Field 


Comments 


Rn RO-R3 Five AGU registers available as pointers for addressing and as 
SP sources and destinations for move instructions 
Rj RO, R1, R2, R3 Four pointer registers available as pointers for addressing 
N N One index register available only for indexed addressing modes 
Mo1 Mo1 One modifier register 


Table A-3 shows the register set available for use in data ALU arithmetic operations. The most common 
field used in this table is FDD. 


Table A-3. Data ALU Registers 


Register Field | Registers in This Field Comments 
FDD A,B Five data ALU registers—two 36-bit accumulators and three 16-bit 
X0, YO, Y1 data registers accessible during data ALU operations 
Contains the contents of the F and DD register fields 
F1iDD Ai, B1 Five data ALU registers—two 16-bit MSP portions of the 
X0, YO, Y1 accumulators and three 16-bit data registers accessible during data 
ALU operations 
DD X0, YO, Y1 Three 16-bit data registers 
F A,B Two 36-bit accumulators accessible during parallel move instruc- 
tions and some data ALU operations 
FA Ai, B1 The 16-bit MSP portion of two accumulators accessible as source 
operands in parallel move instructions 


Address operands used in the instruction field sections of the instruction descriptions are given in 
Table A-4. Addressing mode operators that are accepted by the assembler for specifying a specific 


addressing mode are shown in Table A-5. 
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Table A-4. Address Operands 


Symbol Description 
ea Effective address 
eax Effective address for X bus 
XXXX Absolute address (16 bits) 
pp I/O short address (6 bits, one-extended) 
aa Absolute address (6 bits, zero-extended) 
<...> Specifies the contents of the specified address 
x: X memory reference 
P: Program memory reference 


Table A-5. Addressing Mode Operators 


Symbol 


Description 


I/O short or absolute short addressing mode force operator 


Long addressing mode force operator 


Immediate addressing mode operator 


Immediate long addressing mode force operator 


#< 


Immediate short addressing mode force operator 


Miscellaneous operand notation, including generic source and destination operands and immediate data 
specifiers, are summarized in Table A-6. 


Table A-6. Miscellaneous Operands 


Symbol Description 
S, Sn Source operand register 
D, Dn Destination operand register 
#XX Immediate short data (7 bits for MOVE(I), 6 bits for DO/REP) 
#XXXX Immediate data (16 bits) 
#ii00 8-bit immediate data mask in the upper byte 
#00ii 8-bit immediate data mask in the lower byte 
<OFFSET7> | 7-bit signed PC relative offset 
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Table A-7. Other Symbols 


Symbol Description 
() Optional letter, operand, or operation! 
(5) Any arithmetic or logical instruction that allows parallel moves 
EXT Extension register portion of an accumulator (A2 or B2) 
LSB Least significant bit 
LSP Least significant portion of an accumulator (AO or BO) 
LSW Least significant word 
MSB Most significant bit 
MSP Most significant portion of an accumulator (A1 or B1) 
MSW Most significant word 
r Rounding constant 
LIM Limiting when reading a data ALU accumulator 
<Op> Generic instruction (specifically defined within each section) 


1. For instruction names that contain parentheses, such as DEC(W) or IMPY(16), the 
portion within the parentheses is optional. 
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A.2 Programming Model 


The registers in the DSP56800 core programming model are shown in Figure A-1. 


Data Arithmetic Logic Unit 
Data ALU Input Registers 


31 16 15 0 
15 0 15 0 15 0 


Accumulator Registers 


35 32 31 16 15 0 
3 0 15 0 15 0 
35 32 31 16 15 0 
> [el = [= 
3 0 15 0 15 0 


Address Generation Unit 


15 0 15 0 15 0 
M01 
Pointer Offset Modifier 
Registers Register Register 
Program Controller Unit 
15 0 15 8 7 0 15 0 
Program Status Operating Mode 
Counter Register (SR) Register 
15 0 15 0 
——— 
Hardware Stack (HWS) Loop Address 
Software Stack 12 0 


(Located in X Memory) 


Loop Counter 


Figure A-1. DSP56800 Core Programming Model 
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A.3 Addressing Modes 


The addressing modes are grouped into three categories: 
¢ Register direct—directly references the registers on the chip 
e Address register indirect—uses an address register as a pointer to reference a location in memory 
¢ Special—includes direct addressing, extended addressing, and immediate data 


These addressing modes are described in the following discussion and summarized in Table 4-5 on 
page 4-9. 


All address calculations are performed in the address ALU to minimize execution time and loop overhead. 
Addressing modes specify whether the operands are in registers, in memory, or in the instruction itself 
(such as immediate data) and provide the specific address of the operands. 


The register-direct addressing mode can be subclassified according to the specific register addressed. The 
data registers include XO, Y1, YO, Y, A2, Al, AO, B2, B1, BO, A, and B. The control registers include 
HWS, LA, LC, OMR, SR, CCR, and MR. The address registers include RO, R1, R2, R3, SP, N, and M01. 


Address-register-indirect modes use an address register Rn (RO—R3) or the stack pointer (SP) to point to 
locations in X and P memory. The contents of the Rn is the effective address (ea) of the specified operand, 
except in the indexed-by-offset or indexed-by-displacement mode, where the effective address (ea) is 
(Rn+Nn) or (Rn+xxxx), respectively. Address-register-indirect modes use an address modifier register 
MO1 to specify the type of arithmetic to be used to update the address register RO and optionally R1. R2 
and R3 always use linear arithmetic. If an addressing mode specifies the address offset register (N), it is 
used to update the corresponding Rn. This unique implementation is extremely powerful and allows the 
user to easily address a wide variety of DSP-oriented data structures. All address-register-indirect modes 
use at least one Rn and sometimes N and the modifier register (M01), and the double X memory read uses 
two address registers, one for the first X memory read and one for the second X memory read. Only R3 can 
be used for this second X memory read, and R3 is always updated using linear arithmetic. 


The special addressing modes include immediate and absolute addressing modes as well as implied 
references to the program counter (PC), the software stack, the hardware stack (HWS), and the program 
(P) memory. 


The addressing mode selected in the instruction word is further specified by the contents of the address 
modifier register MO1. The modifier selects whether linear or modulo arithmetic is performed. The 
programming of this register is summarized in Table 4-9 on page 4-27. 


A.4 Condition Code Computation 


The bits in the Condition Code Register (CCR) are set to reflect the status of the processor after certain 
instructions are executed. The CCR bits are affected by data ALU operations, bit-field manipulation 
instructions, the TSTW instruction, parallel move operations, and by instructions that directly reference the 
CCR register. 


In addition, the computation of some condition code bits is affected by the OMR’s Saturation (SA) and 
condition code (CC) bits. The SA bit enables the MAC Output Limiter, which can alter the results of 
computations and thus the condition code bits affected. The CC bit specifies whether condition codes are 
generated using the information in the extension register. See Section A.4.2, “Effects of the Operating 
Mode Register’s SA Bit,” and Section A.4.3, “Effects of the OMR’s CC Bit,” for more information. 
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A.4.1. The Condition Code Bits 


The DSP56800 family defines eight condition code bits, which are contained in the lower-order 8 bits of 
the Status Register (SR) as follows: 


SR 

Status Register 
Reset = $0300 
Read/Write 


LF—Loop Flag 
11,10—Interrupt Mask 
SZ—Size 

L—Limit 
E—Extension 
U—Unnormalized 
N—Negative 
Z—Zero 
V—Overflow 
C—Carry 


* Indicates reserved bits, read as zero and should be written with zero for future compatibility 


Figure A-2. Status Register (SR) 


The C, V, Z, N, U, and E bits are true condition code bits that reflect the condition of the result of a data 
ALU operation. These condition code bits are not affected by address ALU calculations or by data 
transfers over the CGDB. The N, Z, and V condition code bits are updated by the TSTW instruction, which 
can operate on both memory and registers. The L bit is a latching overflow bit that indicates that an 
overflow has occurred in the data ALU or that limiting has occurred when moving an accumulator register 
to memory. The SZ bit is a latching bit that indicates the size of an accumulator when it is moved to data 
memory. 


A.4.1.1 Size (SZ)—Bit 7 


The SZ bit is set only when moving one of the two accumulators (A or B) to data memory. It is set if, 
during this move, bits 30 and 29 of the specified accumulator are not the same—that is, not 00 or 11—as 
follows: 


SZ = SZ | (Bit 30 ® Bit 29) 


SZ is not affected otherwise. Note that the SZ bit is latched once it is set—it is only cleared by a processor 
reset or an instruction that explicitly clears it. 


SZ is not affected by the OMR’s CC or SA bits. 
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A.4.1.2 Limit (L)—Bit 6 


The L bit is set to indicate that one of two conditions has occurred: an overflow has occurred in a data ALU 
operation (see Section A.4.1.7, “Overflow (V)—Bit 1,” on page A-10), or limiting has occurred when 
moving one of the two accumulators (A or B) with a move or parallel move instruction. L is not affected 
otherwise. 


The L bit is latched once it is set; it is cleared only by a processor reset or an instruction that explicitly 
clears it. The complete formula for calculating L is the following: 


L=L| V | (limiting due to a move) 


L is not affected by the OMR’s CC or SA bits. Note, however, that the V bit is affected by both the CC and 
SA bits. As a result, the L bit can be indirectly affected by these two control bits. 


NOTE: 


The TFR instructions performs a register-to-register transfer and is not 
considered a “move” instruction in terms of the preceding discussion. The 
L bit will therefore not be set due to the register-to-register move, even if 
SA is set and saturation occurs. The TFR instruction can set the L bit if it 
has a parallel move and if limiting occurs in that parallel move. 


A.4.1.3. Extension in Use (E)—Bit 5 


The E bit is updated based on the result of a data ALU operation to indicate whether the MSP and LSP of 
the result contain all of the significant bits, or if the extension bits are needed to express the result. If the E 
bit is clear, the MSP and LSP contain all the significant bits—the high-order bits represent only sign 
extension. 


Based on the size of the result or destination, the E bit is calculated as follows: 
For 20- and 36-bit results or destinations: 

E is cleared if the upper 5 bits of the result are 00000 or 11111. E is set otherwise. 
For 16-bit results or destinations: 


If one of the operands is located in XO, YO, or Y1, or comes from memory, the value is first sign 
extended. Sign extension is also performed when the source operand is located in an accumulator. 
If one of the operands is 5-bit immediate data, that value is first zero extended. A 20-bit arithmetic 
operation is then performed, where the result is located in the lowest 16 bits. E is cleared if all of 
the upper 5 bits of the 20-bit result are 00000 or 11111, and is set otherwise. 


For 32-bit results or destinations: 


If one of the operands comes from memory or the Y register, or is 16-bit immediate data, it is first 
sign extended. Sign extension is also performed when the source operand is located in an 
accumulator. If one of the operands is 5-bit immediate data, it is first zero extended. A 36-bit 
arithmetic operation is then performed, where the long result is located in the lowest 32 bits. E is 
cleared if all of the upper 5 bits of the result are 00000 or 11111 and is set otherwise. 


E is not affected by the OMR’s CC bit. 
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NOTE: 


When the SA bit in the OMR register is set to one, the E bit is set based on 
the result before passing through the MAC Output Limiter. If SA is set to 
one and saturation does occur in the MAC Output Limiter, this can result 
in the E bit being set, even though the result is saturated to a value where 
the extension portion is not in use. 


A.4.1.4  Unnormalized (U)—Bit 4 


The U bit is updated under the following conditions. If the SA bit in the OMR is set to one, this bit is 
cleared if saturation occurs in the MAC Output Limiter. If the SA bit is zero or no saturation occurs, U is 
set if the two MSBs of the MSP of the result are the same following a data ALU operation; it is cleared 
otherwise. The computation of U varies depending on the size of the operation’s destination or result. 


For 20-, 32-, and 36-bit destinations or results, U is computed according to the following formula (32-bit 
destinations are first extended as described for the E bit): 


U = ~(Bit 31 © Bit 30) 

Sixteen-bit destinations are first extended as described for the E bit. Then U is computed as follows: 
U = ~(Bit 15 © Bit 14) 

The U bit is not affected by the OMR’s CC bit. 


A.4.1.5 Negative (N)—Bit 3 


The N bit is updated based on the result of a data ALU operation. In general, it reflects the sign bit (MSB) 
of the result, according to the following rules: 
For 20- or 36-bit results: 


N = bit 35 for A or B (bit 31 if the OMR’s CC bit is set to one) 
N = bit 15 for Y1, YO, or XO 


For 32-bit results: 


N = bit 31 for A, B, or Y (the OMR’s CC bit has no effect) 
N = bit 15 for Y1, YO, or XO 


For 16-bit results: 


N = bit 31 for A, B, or Y (the OMR’s CC bit has no effect) 
N = bit 15 for 16-bit destination 


When the SA bit in the OMR register is set to one, the N bit is set based on the result before passing 
through the MAC Output Limiter. 


For the ASRAC and LSRAC instructions, the N bit is calculated differently based on the SA bit in the 
OMR register. When the SA bit is zero and the destination is one of the accumulators, the N bit is obtained 
from bit 35. When SA is one and the destination is one of the accumulators, the N bit is set based on bit 31 
of the result before passing through the MAC output limiter. 


For the IMPY instruction, a 31-bit integer product is calculated internally to the data ALU, and the lowest 
16 bits of this product are stored in the destination register. When SA is one or CC is one, the N bit is set to 
the value in bit 30 of this internally computed result. When SA is zero and CC is zero, the N bit is set to the 
value in bit 15 of this internally computed result. These two values are identical except in the case where 
overflow occurs (that is, the result is larger than and will not fit in 16 bits). 
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For the ASLL instruction, if the CC bit is set, the N bit is always cleared. If CC is 0, the N bit is set 
according to the standard definition outlined in the preceding discussion. 


A.4.1.6 Zero (Z)—Bit 2 


The Z bit is updated based on the result of a data ALU operation. Z is set if the result of an operation is 
zero—that is, all significant bits are set to zero. It is cleared otherwise. 


The number of bits used to compute the value for Z is determined by the size of the result and whether or 
not the OMR’s CC bit is set: 


For 36-bit results: 
Z is set if bits 35 to 0 of the result are all zero, or bits 31 to 0 if the OMR’s CC bit is set. 
For 32-bit results: 


Z is set if bits 31 to O of the result are all zero. It is set using bits 15 to 0 of the result if Y1, YO, or 
XO is the destination. 


For 20-bit results: 
Z is set if bits 35 to 16 of the result are all zero, or bits 31 to 16 if the OMR’s CC bit is set. 
For 16-bit results: 


Z is set if bits 31 to 16 of the result are all zero for A, B, Y; it is set if bits 15 to O of the result are 
all zero for 16-bit destinations. 


Z is not affected by the OMR’s SA bit. 


A.4.1.7 Overflow (V)—Bit 1 


The V bit is updated under the following conditions. If the SA bit in the OMR is set to one, V is set when 
saturation occurs in the MAC Output Limiter. If the SA bit is zero or no saturation occurs, it is set when an 
arithmetic overflow occurs as the result of a data ALU operation. Overflow occurs when the carry into the 
result’s MSB is not equal to the carry out of the MSB, thus changing the sign of the value. The result of the 
ALU operation is therefore not representable in the destination—the result has overflowed. V is cleared 
when overflow does not occur. 


In general, overflow is calculated based on the size of the result or destination of the operation. When the 
CC bit in the OMR is set, however, overflow is determined based on the 32-bit result for what would 
otherwise be 36-bit results. The same is true for 20-bit results: when the CC bit is set, overflow is 
determined based on the 16-bit result. 


For the IMPY instruction, V is set if the computed result does not fit in 16 bits and is cleared otherwise. 
The SA bit has no effect in this case. 


A.4.1.8 Carry (C)—Bit 0 


The C bit is updated based on the result of a data ALU operation. C is set either if a carry is generated out 
of the most significant bit (MSB) of the result for an addition, or if a borrow is generated in a subtraction. 
C is cleared otherwise. 


For 20- or 36-bit results, the carry or borrow is generated out of bit 35. For 32-bit results, the carry or 
borrow is generated out of bit 31. The carry or borrow is generated out of bit 15 for 16-bit results. 


C is not affected by the OMR’s CC or SA bits. 
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A.4.2 Effects of the Operating Mode Register’s SA Bit 


The SA bit in the Operating Mode Register (OMR) can affect the computation of certain condition code 
bits. This bit enables the MAC Output Limiter within the data ALU. When enabled, the results of many 
operations are limited to fit with 32 bits, the extension portion containing only sign information. This 
limiting operation has both direct and indirect effects on the way condition codes are computed. 


The SA bit directly affects the following condition code bits: 
¢ U—cleared if saturation occurs in the MAC Output Limiter 
e V—-set when saturation occurs in the MAC Output Limiter 


The remaining bits in the Condition Code Register are not affected by the SA bit, with the following 
exceptions: 


« L—may be indirectly affected through effects on the V bit 
e N—affected only by the ASRAC, LSRAC, and IMPY instructions 
¢ C—affected only by the ASL instruction 


The value of the SA bit is designed not to affect condition code computation for the TSTW instructions. 
Only the U condition code bit is affected by the SA bit for the CMP instruction. These instructions operate 
independently of the CC bit and correctly generate both signed and unsigned condition codes. 


The SA bit only affects operations in the data ALU, not operations performed in other blocks. These 
include move instructions, bit-manipulation instructions, and address calculations performed by the AGU. 


NOTE: 


When SA is set to one for an application, condition codes are not always 
set in an intuitive manner. It is best to examine the instruction details to 
determine the effect on condition codes when SA is one. See Section A.7, 
“Instruction Descriptions.” 


A.4.3 Effects of the OMR’s CC Bit 


The CC bit in the OMR may affect the computation of the condition code bits. The CC bit establishes how 
many of the bits of an arithmetic or logic operation result are used when calculating condition codes. 
Specifically: 


¢ When CC = 0, the result is interpreted as 36 bits with a valid extension portion. 
¢ When CC = 1, the result is interpreted as 32 bits with the extension portion ignored. 


Signed values can be computed in both cases, but computation of unsigned values must be performed with 
the CC bit set to one. Without setting CC to one prior to executing the TST and CMP instructions, the HI, 
HS, LO, and LS branch/jump conditions cannot be used. 


When the CC bit is set, the following condition code bits are affected: 
e V—set based on the MSB of the result’s MSP portion 
e¢ Z—set using only the MSP and LSP portions of the result 


The remaining bits in the Condition Code Register are not affected by the CC bit, with the following 
exceptions: 


« L—nmay be indirectly affected through effects on the V bit 
« N—affected only by the ASRAC, LSRAC, IMPY, and ASLL instructions 
¢ C—affected only by the ASL instruction 
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The value of the CC bit does not affect condition code computation for the TSTW instructions. These 
instructions operate independently of the CC bit and correctly generate both signed and unsigned condition 
codes. 


The CC bit only affects operations in the data ALU, not operations performed in other blocks. These 
include move instructions, bit-manipulation instructions, and address calculations performed by the AGU. 


A.4.4 Condition Code Summary by Instruction 


Table A-9 provides a detailed view of the condition codes affected by each instruction, and the 
circumstances under which each condition code is set or cleared. Table A-8 describes the notation used. 
Items in the “Notes” column of Table A-9 are explained immediately following the table on page A-15. 


Table A-8. Notation Used for the Condition Code Summary Table 


Notation Description 


Set by the result of the operation according to the standard definition. 


_ Not affected by the operation. 


“16 Set according to the standard definition for 16-bit results. 
*32 Set according to the standard definition for 32-bit results. 
*36 Set according to the standard definition for 36-bit results. 
“A Set by the result of the operation according to the size of destination. 
*B Set by the result of the operation according to the size of destination. 
=0 Cleared. 
=1 Set. 
? Set according to the special computation defined for the operation. 
(number) Set according to the special computation defined by the note with the corresponding number. 


The notes may be found immediately after Table A-9. 


C L bit can be set if overflow has occurred in result. 


T L bit can be set if limiting occurs when reading an accumulator during a parallel move or by the 
instruction itself. An example of the latter case is BFCHG #$8000,A, which must first read the 
A accumulator before performing the bit-manipulation operation. 


CT L bit can be set if overflow has occurred in the result or if limiting occurs when an accumulator is 
being read. 


The condition code computation shown in Table A-9 may differ from that defined in the opcode 
descriptions; see Section A.7, “Instruction Descriptions.” This indicates that the standard definition may be 
used to generate the specific condition code result. For example, the Z flag computation for the CLR 
instruction is shown as the standard definition, while the opcode description indicates that the Z flag is 
always set. Table A-9 gives the chip implementation viewpoint, while the opcode descriptions give the 
user viewpoint. 
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The “Comments” column in the table is also used to report if any of the upper bits in the status register are 
modified. These are not status bits because they do not lie in the status portion of the status register, but 

rather in the control portion. Sometimes these bits are also affected by instructions. Examples include the 
interrupt mask bits, I1 and I0, and the looping bits, LF and NL (NL lies in the OMR register). 


The following instruction mnemonics are not found in Table A-9: ANDC, EORC, NOTC and ORC. This is 


because each of these is an alias for another instruction and not an instruction in its own right. To 
determine condition code calculation for each of these, determine the instructions to which these 


mnemonics are mapped (see Section 6.5.1, “ANDC, EORC, ORC, and NOTC Aliases,” on page 6-12) and 
look at the condition code information for the corresponding real instructions. 


Table A-9. Condition Code Summary 


Instruction SZ L E U N Z Vv Cc Comments 
ABS be CT | *36 | *36 | *386 | *36 | *36 — 
ADC — C *36 | *36 | *386 | *386 | *86 | *36 
ADD ‘J CT *A *A “A “A *A “A 
AND — — — — “16 | *16 =0 — 
ASL x CT | *A *A *A TAN se, | 48) 
ASLL — = |S S| 18)* || 92" |= = 
ASR i T “A “A *A *A =0 (3) 

ASRAC — — — — | (16) | *36 — — 
ASRR — = = = *32 *32 = = 
Bcc _ — — —_— _ — = — 
BFCHG _ T —_— —_— —_— —_ —_— (4) 
BFCLR — T ss = — = eae (4) 
BFSET — T = — — — — (4) 
BFTSTH — T Scere lll See ll] Sees ||| ee lh a 
BFTSTL — T —/;/—/]/—}]—/]—/] 6) 
BRA — —_— = = = = — —_ 
BRCLR — T a = = = — (5) 
BRSET — T = —_ = —_ _ (4) 
CLR % CT | *36 | *36 *36 *36 *36 — | Never overflows 
CMP si CT “A *A *A “A *A *A 
DEBUG —_ = = — _ — _ _ 
DEC(W) * CT *B *B *B *B *B *B 
DIV — Cc — —_— — — (1) (6) 


© MOTOROLA Instruction Set Details 


A-13 


Table A-9. Condition Code Summary (Continued) 


Instruction SZ L E U N Z Vv Cc Comments 
DO _ T _ = = = —_— _— Affects LF, NL bits 
ENDDO = _ _ — _ _ _ — | Condition code not affected 
EOR — — — — “16 | *16 =0 — 
ILLEGAL — — — — — — — — | Sets 11, I0 bits in SR 
IMPY(16) —_— Cc —_— _ (17) “16 (15) _— 
INC(W) ‘i CT *B *B *B *B *B *B 
Jcc = _ — = _ = = — 
JMP — —}/—/;/—-7;]—-] —-] —-] — 
JSR — — — — — — — — 
LEA —_— —_— _ = —_ = —_— —_— 
LSL — | — | — | 16 | 16 | =0 | (7) 
LSLL — — — — *32 *32 = = 
LSR — — — — “16 | *16 =0 (8) 
LSRAC — — — — | (16) | *36 — — 
LSRR — = = = *32 *32 a — 
MAC CT *A “A “A “A “A —_ 
MACR ‘3 CT *A *A *A “A *A — 
MACSU —_ C *A *A “A “A *A —_— 
MOVE id T — — = — _ = 
(10) (10) | (10) | (10) | (10) | (10) | (10) | (10) | NA unless SR is the desti- 
nation in the instruction 
MPY ‘s CT “A *A “A “A “A _— V cleared 
MPYR CT “A “A *A “A *A _— V cleared 
MPYSU — C *A “A *A *A *A = V cleared 
NEG . CT *A “A “A “A “A *A 
NOP _— —_— —_— _ —_ _ —_— —_ 
NORM — Cc *36 | *36 | *86 | *36 (1) — 
NOT — — — — “16 | *16 =0 — 
OR — — — — “16 | *16 =0 — 
POP _— —_— —_— —_— _ —_— —_— — 
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Table A-9. Condition Code Summary (Continued) 


Instruction SZ L E U N Z Vv Cc Comments 

REP — T _— — — —_ — —_— 

RND ‘s CT | *36 | *36 | *86 | *36 | *36 _ 

ROL a — | — | — | 16 | 46 | =0 | (7) 

ROR — — — — “16 | *16 =0 (8) 

RTI Restored — (9) 

RTS —_ —_— —_— —_ _ _— —_— _— 

SBC — Cc *36 | *36 | *386 | *86 | *86 | *36 
STOP — — — — — — — — 

SUB ui CT “A “A *A “A *A “A 

SWI _— — —_— = = = —_— —_— Affects 11, 10 bits in SR 

Tec = = = = — = = = 

TFR = iT ee ee a 

TST - CT | *36 | *36 *36 *36 0 0 Never overflows 
TSTW ‘J — _ _ *36 *36 0 0 Never overflows 
WAIT — — — — —_ — — —_— 

NOTES: 


V is set if the MSB of the destination operand (bit 35 for an accumulator or bit 31 for the Y 
register) is changed as a result of the left shift; V is cleared otherwise. 


C is set if the MSB of the source operand (bit 35 for an accumulator or bit 31 for the Y 
register) is set and is cleared otherwise. 


C is set if bit 0 of the source operand is set and is cleared otherwise. 


Cis set if all bits specified by the mask are set and is cleared otherwise. Bits that are not set 
in the mask should be ignored. If a bit-field instruction is performed on the status register, 
all bits in this register selected by the bit field’s mask can be affected. 


Cis set if all bits specified by the mask are cleared and is cleared otherwise. Ignore bits that 
are not set in the mask. Note that if a bit-field instruction is performed on the status register, 
all bits in this register selected by the bit field’s mask can be affected. 


C is set if the MSB of the result is cleared (bit 35 for an accumulator or bit 31 for the Y 
register). The C bit is cleared if the MSB of the result is set. 


For the accumulators, C is set if bit 31 of the source operand is set and is cleared otherwise. 
For the Y1, YO, and XO registers, C is set if bit 15 of the source operand is set and is cleared 
otherwise. 


For the accumulators, C is set if bit 16 of the source operand is set and is cleared otherwise. 
For the Y1, YO, and X0 registers, C is set if bit 0 of the source operand is set and is cleared 
otherwise. 
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9. The “?” bit is set according to value pulled from stack. 


10. If the SR is specified as a destination operand (for example, MOVE X: (RO), SR), each bit 
is set according to the corresponding bit of the source operand. If SR is not specified as a 
destination operand, none of the status bits are affected. 


11. Cis set if bit 0 of the SP register is set and is cleared otherwise. 
12. N is set if bit 15 of the HWS register is set before the ENDDO and is cleared otherwise. 


13. Zis set if bits 15-0 of the HWS register are zero before the ENDDO and is cleared 
otherwise. 


14. The lowest eight condition code bits in the status register are loaded with the value in the 
8-bit FISR register. 


15. The V bit for the IMPY instruction is set if the calculated integer product does not fit in 16 
bits. 


16. The setting of the N bit for the ASRAC and LSRAC instructions depends on the OMR’s 
SA bit. If SA is one, then the N bit is equal to bit 31 of the result. If SA is zero, then N is 
equal to bit 35 of the result. 


17. When SA is zero and CC is zero for the IMPY instruction, the N bit is set using *16. When 
SA is one or CC is set to one, this bit is set as described in Section A.4.1.5, “Negative 
(N)—Bit 3.” 


18. When CC is one for the ASLL instruction, the N bit is cleared. When CC is zero, this bit is 
set as described under Section A.4.1.5, “Negative (N)—Bit 3.” 


See Section 3.6, “Condition Code Generation,” on page 3-33 for additional information on condition 
codes. 


A.5 Instruction Timing 


This section describes how to calculate the DSP56800 instruction timing manually using the provided 
tables. Three complete examples are presented to illustrate the use of the tables. Alternatively, the user can 
obtain the number of instruction program words and the number of oscillator clock cycles required for a 
given instruction by using the simulator; this is a simple and fast method of determining instruction timing 
information. 


The number of words for an instruction depends on the instruction operation and its addressing mode. The 
symbols used in one table may reference subsequent tables to complete the instruction word count. 


The number of oscillator clock cycles per instruction is dependent on many factors, including the number 
of words per instruction, the addressing mode, whether the instruction fetch pipe is full or not, the number 
of external bus accesses, and the number of wait states inserted in each external access. The symbols used 
in one table may reference subsequent tables to complete the execution clock-cycle count. 


The tables in this section present the following information: 


¢ Table A-11 on page A-18 gives the number of instruction program words and the number of 
machine clock cycles for each instruction mnemonic. 


¢ Table A-12 on page A-19 gives the number of additional instruction words (if any) and additional 
clock cycles (if any) for each type of parallel move operation. 


¢ Table A-13 on page A-20 gives the number of additional (if any) clock cycles for each type of 
MOVEC operation. 
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e Table A-14 on page A-20 gives the number of additional (if any) clock cycles for each type of 
MOVEM operation. 


¢ Table A-15 on page A-20 gives the number of additional (if any) clock cycles for each type of 


bit-field manipulation (BFCHG, BFCLR, BFSET, BFTSTH, BFTSTL, BRCLR, and BRSET) 


operation. 


¢ Table A-16 on page A-20 gives the number of additional clock cycles (if any) for each type of 
branch or jump (Bec, Jcc, and JSR) operation. 


¢ Table A-17 on page A-21 gives the number of additional clock cycles (if any) for the RTS or RTI 


instruction. 


¢ Table A-18 on page A-21 gives the number of additional clock cycles (if any) for the TSTW 


instruction. 


¢ Table A-19 on page A-21 gives the number of additional instruction words (if any) and additional 
clock cycles (if any) for each effective addressing mode. 


¢ Table A-20 on page A-22 gives the number of additional clock cycles (if any) for external data, 
external program, and external I/O memory accesses. 


The symbols used in the tables are summarized in Table A-10. 


Table A-10. Instruction Timing Symbols 


Symbol Description 
aio Time required to access an I/O operand 
ap Time required to access a P memory operand 
ax Time required to access an X memory operand 
AXX Time required to access X memory operands for double read 
ea Time or number of words required for an effective address 
jx Time required to execute part of a jump-type instruction 
mv Time or number of words required for a move-type operation 
mvb Time required to execute part of a bit-manipulation instruction 
mvc Time required to execute part of a MOVEC instruction 
mvm Time required to execute part of a MOVEM instruction 
mvp Time required to execute part of a MOVEP instruction 
mvs Time required to execute part of a MOVES instruction 
rx Time required to execute part of an RTS instruction 
wp Number of wait states used in accessing external P memory 
WX Number of wait states used in accessing external X memory 


The assumptions for calculating execution time are the following: 


e All instruction cycles are counted in oscillator clock cycles. Two oscillator clock cycles are 


equivalent to one instruction cycle. 
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¢ The instruction fetch pipeline is full. 


¢ There is no contention for instruction fetches. Thus, external program instruction fetches are 
assumed not to have to contend with external data memory accesses. 


¢ There are no wait states for instruction fetches done sequentially (as for non-change-of-flow 
instructions), but they are taken into account for change-of-flow instructions that flush the pipeline, 
such as JMP, Jcc, RTS, and so on. 


In order to better understand and use the following tables, examine the three examples for computing an 
instruction’s execution time that are presented at the end of this section: Example A-1 on page A-22, 
Example A-2 on page A-23, and Example A-3 on page A-25. 


Table A-11. Instruction Timing Summary 


Mnemonic aoe Clock Cycles Mnemonic sae ea Clock Cycles 

ABS 1 2+mv LSRAC 1 2 

ADC 1 2 LSRR 1 2 

ADD 1+mva 2+(ea or mv) MAC 1 2+mv 
AND 1 2 MACR 1 2+mv 
ANDC 2+ea 4+mvb MACSU 1 2 

ASL 1 2+mv MOVE! 1 2+mv 
ASLL 1 2 MOVE(C) 1+ea 2+mvc 
ASR 1 2+mv MOVE(I) 1l+ea 2+ea 
ASRAC 1 2 MOVE(M) 1 8+mvm 
ASRR 1 2 MOVE(P) 1+ea 2+ea 
Bcc 1 4+jx MOVE(S) 1+ea 2+ea 
BFCHG 2+ea 4+mvb MPY 1 2+mv 
BFCLR 2+ea 4+mvb MPYR 1 2+mv 
BFSET 2+ea 4+mvb MPYSU 1 2 
BFTSTH 2+ea 4+mvb NEG 1 2+mv 
BFTSTL 2+ea 4+mvb NOP 1 2 

BRA 1 6+jx NORM 1 2 
BRCLR 2+ea 8+mvb+jx NOT 1 2 
BRSET 2+ea 8+mvb+jx NOTC 2+ea 4+mvb 
CLR 1 2+mv OR 1 2 

CMP 1+mva 2+(ea or mv) ORC 2+ea 4+mvb 
DEBUG 1 4 POP 1 2+ea 
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Table A-11. Instruction Timing Summary (Continued) 

Mnemonic apenas Clock Cycles Mnemonic ee Clock Cycles 
DEC(W) 1l+ea 2+(ea or mv) REP 1 6 
DIV 1 2 RND 1 2+mv 
DO 2 6 ROL 1 2 
ENDDO 1 2 ROR 1 2 
EOR 1 2 RTI 1 10+1%x 
EORC 2+ea 4+mvb RTS 1 10+1rx 
ILLEGAL 1 4 SBC 1 2 
IMPY (16) 1 2 STOP? 1 n/a 
INC(W) 1+ea 2+(ea or mv) SUB 1+ea 2+(ea or mv) 
Jec 2 A+jx SWI 1 8 
JMP 2 6+jx Tcc 1 2 
JSR 2 8+jx TFR 1 2+mv 
LEA 1+ea 2+ea TST 1 2+mv 
LSL 1 2 TSTW 1 2+tst 
LSLL 1 2 WAIT? 1 n/a 
LSR 1 2 


1. This MOVE applies only to the case where two reads are performed in parallel from the X memory. 


2. The STOP instruction disables the internal clock oscillator. After the clock is turned on, an internal 
counter counts 65,536 cycles before enabling the clock to the internal DSP circuits. 


3. The WAIT instruction takes a minimum of 16 cycles to execute when an internal interrupt is pending at 
the time the WAIT instruction is executed. 


Table A-12. Parallel Move Timing 


Parallel Move Operation + mv Words +mv Cycles 
No parallel data move 0 0 
X: (X memory move) 0 ax 
X: X: (XX memory move) 0 AXX 
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Table A-13. MOVEC Timing Summary 


MOVEC Operation + mvc Cycles 
16-bit immediate — register 2 
Register — register 0 
X memory © register ea + ax 


Table A-14. MOVEM Timing Summary 


MOVEM + mvm Cycles 


Register < P memory ap 


Note: The “ap” term represents the wait states spent when accessing the program memory 
during DATA read or write operations and does not refer to instruction fetches. 


Table A-15. Bit-Field Manipulation Timing Summary 


Bit-Field Manipulation Operation + mvb Cycles 
BFCHG, BFCLR, or BFSET on X memory ea + (2* ax) 
BFTSTH or BFTSTL on X memory ea + ax 
BFTSTH, BFTSTL, BFCHG, BFCLR, or BFSET on register 0 
BRSET or BRCLR with condition true 2+¢ea+ (2* ax) 
BRSET or BRCLR with condition false ea + (2 * ax) 


Table A-16. Branch/Jump Instruction Timing Summary 


Branch/Jump Instruction Operation + jx Cycles 
Jcc, Bcc—condition true 2+ (2* ap) 
Jcc, Bcc—condition false (2 * ap) 
JMP, JSR (2 * ap) 
NOTE: 


All two-word jumps execute three program memory fetches to refill the 
pipeline, one of them being the instruction word located at the jump 
instruction’s second-word address + 1. If the jump instruction was fetched 
from a program memory segment with wait states, another “ap” should be 
added to account for that third fetch. 
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Table A-17. RTS Timing Summary 


Operation +rx Cycles 


RTI, RTS 2*ap+2* ax 


NOTE: 


The term “2 * ap” represents the two instruction fetches done by the 
RTI/RTS instruction to refill the pipeline. The ax term represents fetching 
the return address from the software stack when the stack pointer points to 
external X memory, and the 2 * ax term includes both this fetch and the 
fetch of the SR as performed by the RTT and RTS instructions. 


Table A-18. TSTW Timing Summary 


TSTW Operation + tst Cycles 
Register 0 
X memory ea + ax 


Table A-19. Addressing Mode Timing Summary 


Effective Addressing Mode + ea Words + ea Cycles 

Address Register Indirect 

No update 0 0 
Post-increment by 1 0 0 
Post-decrement by 1 0 0 
Post addition by offset Nn 0 0 
Indexed by offset Nn 0 2 
Special 

Immediate data 1 2 
Immediate short data 0 0 
Absolute address 1 2 
Absolute short address 0 0 
I/O short address 0 0 
Implicit 0 0 
Indexed by short displacement 0 2 
Indexed by long displacement 1 4 
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Table A-20. Memory Access Timing Summary 


Access X Memory P Memory VO Access + ax +ap + aio + axx 
Type Access Access Access | Cycle | Cycle | Cycle 
x: Int — — 0 = = = 
xX: Ext —_— — wx! _— — —_— 
P: — Int = = 0 — — 
P: — Ext — = wp2 = = 
lO: — — Int — — 0 — 
X:X: Int:Ext = — _ = = 0 
X:X: Ext:Int = = — — — WX 
X:X: /O:Int = = — — = 0 


1. wx—external X memory access wait states 


2. wp—external P memory access wait states 


Three examples using the preceding tables follow. 


Example A-1. Arithmetic Instruction with Two Parallel Reads 


Problem 


Calculate the number of DSP56800 instruction program words and the number of oscillator clock cycles 
required for the following instruction: 


MACR X0,Y0,A  X:(RO)+,YO X:(R3)+,X0 
Where the following conditions are true: 
* Operating mode register (OMR) = $02 (normal expanded memory map). 


e« External X memory accesses require zero wait state, (assume external mem requires no wait state 
and BCR contains the value $00). 


¢ RO address register = $CO00 (external X memory). 
¢ R3 address register = $0052 (internal X memory). 
Solution 


To determine the number of instruction program words and the number of oscillator clock cycles required 
for the given instruction, the user should perform the following steps: 


1. Look up the number of instruction program words and the number of oscillator clock cycles 
required for the opcode-operand portion of the instruction inTable A-11 on page A-18. 


According to Table A-11 on page A-18, the MACR instruction will require one instruction 
program word and will execute in (2 + mv) oscillator clock cycles. The term “mv” 
represents the additional instruction program words (if any) and the additional oscillator 
clock cycles (if any) that may be required over and above those needed for the basic MACR 
instruction due to the parallel move portion of the instruction. 
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Example A-1. Arithmetic Instruction with Two Parallel Reads (Continued) 
2. Evaluate the “mv” term using Table A-12 on page A-19. 


The parallel move portion of the MACR instruction consists of an XX memory read. 
According to Table A-12 on page A-19, the parallel move portion of the instruction will 
require mv = axx additional oscillator clock cycles. The term “axx” represents the number 
of additional oscillator clock cycles (if any) that are required to access two operands in the 
X memory. 


3. Evaluate the “axx” term using Table A-20 on page A-22. 


The parallel move portion of the MACR instruction consists of an XX Memory Read. 
According to Table A-20 on page A-22, the term “axx” depends upon where the 
referenced X memory locations are located in the DSP56800 memory space. External X 
memory accesses may require additional oscillator clock cycles depending on the memory 
device’s speed. Here we assume external X memory accesses require wx = 0 wait state or 
additional oscillator clock cycles. For this example, the second X memory reference is 
assumed to be an internal reference, while the first X memory reference is assumed to be 
an external reference. Thus, according to Table A-20 on page A-22, the XX memory 
reference in the parallel move portion of the MACR instruction will require axx = wx = 0 
additional oscillator clock cycle. 


4. Compute the final results. 
Thus, based upon the assumptions given for Table A-11 on page A-18, the instruction 
MACR X0,Y0,A X:(RO)+,YO X:(R3)+,X 


will require | instruction program word and will execute in 
(2 + mv) = (2 + axx) = (2 + wx) = (2 + 0) =2 oscillator clock cycles. 


NOTE: 
If a similar calculation were made fora MOVEC, MOVEM., or one of the 
bit-field manipulation instructions (BFCHG, BFCLR, BFSET or 
BFTST), using Table A-12 on page A-19 would no longer be appropriate. 
The user would refer to Table A-13 on page A-20, Table A-14 on 
page A-20, or Table A-15 on page A-20, respectively. 


Example A-2. Jump Instruction 


Problem 


Calculate the number of DSP56800 instruction program words and the number of oscillator clock cycles 
required for the following instruction: 


JEQ $2000 
Where the following conditions are true: 
¢ OMR = $02 (normal expanded memory map). 


e« External P memory accesses require four wait states (assume external memory access requires 4 
wait states in this example). 
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Example A-2. Jump Instruction (Continued) 


Solution 


To determine the number of instruction program words and the number of oscillator clock cycles required 
for the given instruction, the user should perform the following steps: 


1. Look up the number of instruction program words and the number of oscillator clock cycles 
required for the opcode-operand portion of the instruction in Table A-11 on page A-18. 


According to Table A-11 on page A-18, the Jcc instruction will require two instruction 
program words and will execute in (4 + jx) oscillator clock cycles. The term “jx” represents 
the number of additional oscillator clock cycles (if any) required for a jump-type 


instruction. 


2. Evaluate the “jx” term using Table A-16 on page A-20. 


According to Table A-16 on page A-20, the Jcc instruction will require 2 + jx additional 
oscillator clock cycles. If the “ea” condition is true, jx = 2 + 2 * ap, whereas jx = 2 * apif 
the condition is false. The term “ap” represents the number of additional oscillator clock 
cycles (if any) that are required to access a P memory operand. Note that the “+ (2 * ap)” 
term represents the two program memory instruction fetches executed at the end of a 
one-word jump instruction to refill the instruction pipeline. 


3. Evaluate the “ap” term using Table A-20 on page A-22. 


According to Table A-20 on page A-22, the term “ap” depends upon where the referenced 
P memory location is located in the 16-bit DSP memory space. External memory accesses 
require additional oscillator clock cycles according to the number of wait states required. 
Here we assume that external P memory accesses require wp = 4 wait states or additional 
oscillator clock cycles. For this example the P memory reference is assumed to be an 
external reference. Thus, according to Table A-20 on page A-22, the Jcc instruction will 
use the value ap = wp = 4 oscillator clock cycles. 


4. Compute the final results. 
Thus, based upon the assumptions given for Table A-11 on page A-18, the instruction 


JEQ $2000 


will require (1 + 1) = (1 + 1) = 2 instruction program word and will execute in (4 + jx) = 
(4+ea+ (2 * ap)) =(4+ ea+ (2 * wp)) = (4424 (2 * 4)) = 14 oscillator clock cycles. 
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Example A-3. RTS Instruction 


Problem 


Calculate the number of DSP56800 instruction program words and the number of oscillator clock cycles 
required for the following instruction: 


RTS 


Where the following conditions are true: 

* OMR = $02 (normal expanded memory map). 

e External P memory accesses require four wait state. 

¢ Return Address (on the stack) = $0100 (internal P memory). 
Solution 


To determine the number of instruction program words and the number of oscillator clock cycles required 
for the given instruction, the user should perform the following steps: 


1. Look up the number of instruction program words and the number of oscillator clock cycles 
required for the opcode-operand portion of the instruction in Table A-11 on page A-18. 


According to Table A-11 on page A-18, the RTS instruction will require one instruction 
program word and will execute in (10 + rx) oscillator clock cycles. The term “rx” represents 
the number of additional oscillator clock cycles (if any) required for an RTS instruction. 


2. Evaluate the “rx” term using Table A-17 on page A-21. 


According to Table A-17 on page A-21, the RTS instruction will require rx = (2 * ap) 
additional oscillator clock cycles. The term “ap” represents the number of additional 
oscillator clock cycles (if any) that are required to access a P memory operand. The term 
“(2 * ap)” represents the two program memory instruction fetches executed at the end of 
an RTS instruction to refill the instruction pipeline. 


3. Evaluate the “ap” term using Table A-20 on page A-22. 


According to Table A-20 on page A-22, the term “ap” depends upon where the referenced 
P memory location is located in the 16-bit DSP memory space. External memory accesses 
may require additional oscillator clock cycles, according to the memory device’s speed. 
Here we assume that external P memory accesses require wp = 4 wait state or additional 
oscillator clock cycles. For this example the P memory reference is assumed to be an 
internal reference. This means that the return address ($0100) pulled from the system 
stack by the RTS instruction is in internal P memory. Thus, according to Table A-20 on 
page A-22, the RTS instruction will use the value ap = 0 additional oscillator clock cycles. 


4. Compute the final results. 
Thus, based upon the assumptions given for Table A-11 on page A-18, the instruction 
RTS 


will require one instruction program word and will execute in (10 + rx) = (10 + (2 * ap)) 
= (10 + (2 * 0)) = 10 oscillator clock cycles. 
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A.6 


Instruction Set Restrictions 


These items are restrictions on the DSP56800 instruction set: 


A-26 


A NORM instruction cannot be immediately followed by an instruction that accesses X memory 
using the RO pointer. In addition, NORM can only use the RO address register. 


No bit-field operation (ANDC, ORC, NOTC, EORC, BFCHG, BFCLR, BFSET, BFTSTH, 
BFISTL, BRCLR, or BRSET) can be performed on the HWS register. 


Only positive immediate values less than 8,192 can be moved to the LC register (13 bits). 


The following registers cannot be specified as the loop count for the DO or REP instruction: HWS, 
SR, OMR, or MO1. Similarly, the immediate value of $0 is not allowed for the loop count of a DO 
instruction. 


Any jump, branch, or branch on bit field may not specify the instructions at LA or LA-1 of a 
hardware DO loop as their target addresses. Similarly, these instructions may not be located in the 
last two locations of a hardware DO loop (that is, at LA or at LA-1). 


A REP instruction cannot repeat on an instruction that accesses the P memory or on any multiword 
instruction. 


The HI, HS, LO, and LS condition code expressions can only be used when the CC bit is set in the 
OMR register. 


The access performed using R3 and XAB2/XDB2 cannot reference external memory. This access 
must always be made to internal memory. 


If a MOVE instruction changes the value in one of the address registers (RO—R3), then the contents 
of the register are not available for use until the second following instruction (that is, the 
immediately following instruction should not use the modified register to access X memory or 
update an address). This also applies to the SP register and the MO1 register. In addition, it applies 
if a 16-bit immediate value is moved to the N register. 


If a bit-field instruction changes the value in one of the address registers (RO-R3), then the contents 
of the register are not available for use until the second following instruction (that is, the 
immediately following instruction should not use the modified register to access X memory or 
update an address). This also applies to the SP, the N, and the MO1 registers. 


For the case of nested hardware DO loops, it is required that there be at least two instructions after 
the pop of the LA and LC registers before the instruction at the last address of the outer loop. 
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A.7 Instruction Descriptions 


This section describes in complete detail each instruction in the DSP56800 Family instruction set. The 
format of each instruction description is given in Section A.1, “Notation,” at the beginning of this 
appendix. Instructions that allow parallel moves include the notation “(parallel move)” in both the 
“Assembler Syntax” and the “Operation” fields. The example given with each instruction discusses the 
contents of all the registers and memory locations referenced by the opcode-operand portion of that 
instruction, though not those referenced by the parallel move portion of that instruction. 


The “Parallel Move Descriptions” section that follows the MOVE instruction description give a complete 
discussion of parallel moves, including examples that discuss the contents of all the registers and memory 
locations referenced by the parallel move portion of an instruction. 


Whenever an instruction uses an accumulator as both a destination operand for a data ALU operation and 
as a source for a parallel move operation, the parallel move operation will use the value in the accumulator 
prior to the execution of any data ALU operation. 


Whenever a bit in the condition code register is defined according to the standard definition as given in 
Section A.4, “Condition Code Computation,” a brief definition will be given in normal text in the 
“Condition Code” section of that instruction description. Whenever a bit in the condition code register is 
defined according to a special definition for some particular instruction, the complete special definition of 
that bit is given in the “Condition Code” section of that instruction in bold text to alert the user to any 
special conditions concerning its use. 
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ABS Absolute Value ABS 


Operation: Assembler Syntax: 
|DI>D (parallel move) ABS D (parallel move) 
Description: Take the absolute value of the destination operand (D) and store the result in the destination accumu- 
lator. 
Example: 
ABS A X:(RO)+,YO ; take ABS value, move data into YO, 
; update RO 
A Before Execution A After Execution 
F FFFF FFF2 0 0000 OO00E 
A2 Al AO A2 Al AO 


Explanation of Example: 
Prior to execution, the 36-bit A accumulator contains the value $F: FFFF:FFF2. Since this is a negative 


number, the execution of the ABS instruction takes the two’s-complement of that value and returns 
$0:0000:000E. 


Note: When the D operand equals $8:0000:0000 (-16.0 when interpreted as a decimal fraction), the ABS in- 
struction will cause an overflow to occur since the result cannot be correctly expressed using the stan- 
dard 36-bit, fixed-point, two’s-complement data representation. Data limiting does not occur (that is, 
A is not set to the limiting value of $7:FFFF:FFFF) but remains unchanged. 


Condition Codes Affected: 


15 14 13 12 11 10 9 8|7 6 5 4 3 2 14 0 
LF); * |] *)* | * | * |] t0/SZ}/L]/E;]U)JN{ZI]Vic 


— Set according to the standard definition of the SZ bit (parallel move) 
— Set if limiting (parallel move) or overflow has occurred in result 

— Set if the signed integer portion of A or B result is in use 

Set according to the standard definition of the U bit 

— Set if bit 35 of A or B result is set except during saturation. 

— Setif A or B result equals zero 

— Set if overflow has occurred in A or B result 


<NZCMrY 
| 
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ABS 


Instruction Fields: 


Absolute Value 


ABS 


Operation Operands Cc Comments 
ABS F 2 Absolute value. 
Data ALU Operation Parallel Memory Read or Write 
Operation Registers Memory Access Source or Destination 
ABS A X:(Rn)+ X0 
B X:(Rn)+N Y1 
YO 
A 
B 
Al 
B1 
Timing: 2 + mv oscillator clock cycles 


Memory: 


1 program word 
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ADC 


Operation: 
$+C+D— 75D 


(no parallel move) 


Add Long with Carry 


Assembler Syntax: 


ADC 


S,D 


(no parallel move) 


ADC 


Description: Add the source operand (S) and C to the destination operand (D) and store the result in the destination 
accumulator. Long words (32 bits) may be added to the (36-bit) destination accumulator. 


Usage: This instruction is typically used in multi-precision addition operations (see Section 3.3.8, “Multi-Pre- 
cision Operations,” on page 3-23) when it is necessary to add together two numbers that are larger than 
32 bits (such as 64-bit or 96-bit addition). 
Example: 
ADC Y,A 
Before Execution After Execution 
0 2000 8000 0 4001 0001 
A2 Al AO A2 Al AO 
Y 2000 8000 Y 2000 8000 
Y1 YO Y1 YO 
SR 0301 SR 0300 


Explanation of Example: 
Prior to execution, the 32-bit Y register, comprised of the Y1 and YO registers, contains the value 
$2000:8000, and the 36-bit accumulator contains the value $0:2000:8000. In addition, C is set to one. 
The ADC instruction automatically sign extends the 32-bit Y registers to 36 bits and adds this value to 
the 36-bit accumulator. In addition, C is added into the LSB of this 36-bit addition. The 36-bit result 
is stored back in the A accumulator, and the condition codes are set correctly. The Y1:Y0 register pair 
is not affected by this instruction. 


Note: C is set correctly for multi-precision arithmetic, using long word operands only when the extension 
register of the destination accumulator (A2 or B2) contains sign extension of bit 31 of the destination 
accumulator (A or B). 
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ADC 


Add Long with Carry ADC 


Condition Codes Affected: 


15 14 13 12 #11 #10 9 8 7 6 5 4 3 2 1 0 
LF} * ‘ ig ® ig i lo | sz} L E;/U;}|N/Z/Vic 

L — Set if overflow has occurred in result 

E — Set if the signed integer portion of A or B result is in use 

U — Set according to the standard definition of the U bit 

N — Set if bit 35 of A or B result is set except during saturation 

Z — Setif A orB result is zero; cleared otherwise 

V — Set if overflow has occurred in A or B result 

C — Set if a carry (or borrow) occurs from bit 35 of A or B result 


See Section 3.6.5, “16-Bit Destinations,” on page 3-35 for cases with X0, YO, or Y1 as D. 
See Section 3.6.2, “36-Bit Destinations—CC Bit Set,” on page 3-34 and Section 3.6.4, “20-Bit Desti- 
nations—CC Bit Set,” on page 3-34 for the case when the CC bit is set. 


Instruction Fields: 


Operation Operands Cc W Comments 
ADC Y,F 2 1 Add with carry (sets C bit also) 
Timing: 2 oscillator clock cycles 
Memory: 1 program word 


@ MOTOROLA 


Instruction Set Details A-31 


ADD Add ADD 


Operation: Assembler Syntax: 
S$+D- D (parallel move) ADD S,D (parallel move) 


Description: Add the source operand (S) to the destination operand (D) and store the result in the destination accu- 
mulator. Words (16 bits), long words (32 bits), and accumulators (36 bits) may be added to the desti- 


nation. 
Usage: This instruction can be used for both integer and fractional two’s-complement data. 
Example: 
ADD X0,A X: (RO) +, YOX: (R3)+,X0 ; 16-bit add, update 
Y0,X0,RO,R3 
Before Execution After Execution 
0 0100 0000 0 OOFF 0000 
A2 Al AO A2 Al AO 
X0 FFFF x0 FFFF 


Explanation of Example: 

Prior to execution, the16-bit XO register contains the value $FFFF, and the 36-bit A accumulator con- 
tains the value $0:0100:0000. The ADD instruction automatically appends the 16-bit value in the XO 
register with 16 LS zeros, sign extends the resulting 32-bit long word to 36 bits, and adds the result to 
the 36-bit A accumulator. Thus, 16-bit operands are always added to the MSP of A or B (A1 or B1), 
with the result correctly extending into the extension register (A2 or B2). Operands of 16 bits can be 
added to the LSP of A or B (AO or BO) by loading the 16-bit operand into YO; this forms a 32-bit word 
by loading Y1 with the sign extension of YO and executing an ADD Y,Aor ADD Y,B instruction. 
Similarly, the second accumulator can also be used as the source operand. 


Note: C is set correctly using word or long word source operands if the extension register of the destination 
accumulator (A2 or B2) contains sign extension from bit 31 of the destination accumulator (A or B). 
C is always set correctly by using accumulator source operands. 
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ADD 


ADD 


Add 
Condition Codes Affected: 
ie MR rid CCR > 
15 14 13 12 #11 #10 9 8 7 6 5 3 2 1 0 
LF; *);* |] * 7 * | * | WH] lo |SZ| LYE N/Z/vic 
SZ — Set according to the standard definition of the S bit (parallel move) 
L — Set if limiting (parallel move) or overflow has occurred in result 
E — Set if the signed integer portion of A or B result is in use 
U — Set according to the standard definition of the U bit 
N — Set if bit 35 of A or B result is set except during saturation 
Z — Setif A or B result equals zero 
V — Set if overflow has occurred in A or B result 
C — Set ifa carry (or borrow) occurs from bit 35 of A or B result 
Instruction Fields: 
Data ALU Operation Parallel Memory Read or Write 
Operation Registers Memory Access Source or Destination 
ADD X0,F X:(Rn)+ X0 
Y1,F X:(Rn)+N Y1 
YO,F YO 
A 
A,B B 
B,A Al 
B1 
(F = Aor B) 
Data ALU First and Second Memory Destinations for Memory 
Operation Reads Reads 
Operation Registers Read1 Read2 Destination1 Destination2 
ADD X0,A X:(RO)+ X:(R3)+ YO X0 
Y1,A X:(RO)+N X:(R3)- 
Y0,A Y1 X0 
X:(R1)+ ; : 
: Valid Valid 
AOE BA EN destinations destinations 
Y1,B 
YOB for Read1 for Read2 
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ADD 


Instruction Fields: 


Add ADD 


Operation Operands W Comments 
ADD DD,FDD 1 36-bit addition of two registers 
F1,DD 
~F,F 
Y,F 
X:(SP-xx), FDD 1 Add memory word to register. 
X:aa,FDD 1 X:aa represents a 6-bit absolute address. Refer to 
Absolute Short Address (Direct Addressing): 
X:xxxXx,FDD 2 | <aa> on page 4-22. 
FDD,X:(SP-xx) 2 Add register to memory word, storing the result back 
to memory 
FDD,X:xxxx 2 
FDD,X:aa 2 
#xx,FDD 1 Add an immediate integer 0-31 
#xxxx,FDD 2 Add a signed 16-bit immediate 
Timing: 2 + mv oscillator clock cycles for ADD instructions with a single or dual parallel move. 


Refer to previous tables for ADD instructions without a parallel move. 


Memory: 1 program word for ADD instructions with a single or dual parallel move. 
Refer to previous tables for ADD instructions without a parallel move. 
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AND Logical AND AND 


Operation: Assembler Syntax: 
SsD>D (no parallel move) AND S,D (no parallel move) 
S*D[31:16] > D[31:16] (no parallel move) AND S,D (no parallel move) 


where * denotes the logical AND operator 


Description: Logically AND the source operand (S) with the destination operand (D) and store the result in the des- 
tination. This instruction is a 16-bit operation. If the destination is a 36-bit accumulator, the source is 
ANDed with bits 31-16 of the accumulator. The remaining bits of the destination accumulator are not 


affected. 
Usage: This instruction is used for the logical AND of two registers; the ANDC instruction is appropriate to 
AND a 16-bit immediate value with a register or memory location. 
Example: 
AND X0O,A ; AND XO with Al 
Before Execution After Execution 
6 1234 5678 6 1200 5678 
A2 Al AO A2 Al AO 
X0 7F00 X0 7F00 


Explanation of Example: 
Prior to execution, the 16-bit XO register contains the value $7F00, and the 36-bit A accumulator con- 
tains the value $6:1234:5678. The AND X0,A instruction logically ANDs the 16-bit value in the XO 
register with bits 31-16 of the A accumulator (A1) and stores the 36-bit result in the A accumulator. 
Bits 35-32 in the A2 register and bits 15-0 in the AO register are not affected by this instruction. 


Condition Codes Affected: 


EE) * 7 7 * - HW | lO} SZ7 Lb E;}/U|N/Z/|Vic 


N — Setif bit 31 of A or B result is set 
Z — Setif bits 31-16 of A orB result are zero 
V — Always cleared 


See Section 3.6.5, “16-Bit Destinations,” on page 3-35 for cases with XO, YO, or Y1 as D. 
See Section 3.6.2, “36-Bit Destinations—CC Bit Set,” on page 3-34 and Section 3.6.4, “20-Bit Desti- 
nations—CC Bit Set,” on page 3-34 for the case when the CC bit is set. 


Instruction Fields: 


Operation Operands Cc W Comments 
AND DD,FDD 2 1 16-bit logical AND 
F1,DD 
Timing: 2 oscillator clock cycles 
Memory: 1 program word 
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ANDC Logical AND, Immediate ANDC 


Operation: Assembler Syntax: 
#XXXX*X:<ea> > X:<ea> ANDC #i111,X:<ea> 
#xxxx*D > D ANDC #iiii,D 


where * denotes the logical AND operator 


Implementation Note: 
This instruction is an alias to the BFCLR instruction, and assembles as BFCLR with the 16-bit imme- 
diate value inverted (one’s-complement) and used as the bit mask. It will disassemble as a BFCLR in- 
struction. 


Description: Logically AND a 16-bit immediate data value with the destination operand, and store the results back 
into the destination. C is also modified as described in the following discussion. This instruction per- 
forms a read-modify-write operation on the destination and requires two destination accesses. 


Example: 
ANDC #$5555,X:<<SA000; AND with immediate data 
Before Execution After Execution 
X:$A000 C3FF X:$A000 4155 
SR 0301 SR 0300 


Explanation of Example: 
Prior to execution, the 16-bit X memory location X:$A000 contains the value $C3FF. Execution of the 
instruction tests the state of the bits 4, 8, and 9 in X:$FFE2, clears C (because not all the CCR bits were 
set), and then clears the bits. 


Condition Codes Affected: 


LF} * : , 7 : HW | 10 |} SZ] Lb E}/U;N}]2Z2]Vi]cC 


For destination operand SR: 
? — Cleared as defined in the field and if specified in the field 
For other destination operands: 
— Set if data limiting occurred during 36-bit source move 
C — Set if all bits specified by the mask are set 
Clear if not all bits specified by the mask are set 
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ANDC 


Instruction Fields: 


Logical AND, Immediate ANDC 


Operation Operands Cc W Comments 
BFCLR #xxxx,DDDDD 4 2 Absolute value. 
All registers in DDDDD are permitted except HWS. 
#XXXX,X:(R2+Xx) 6 2 
X:aa represents a 6-bit absolute address. Refer to 
#XXxX,X:(SP-xx) 6 2 | Absolute Short Address (Direct Addressing): 
<aa> on page 4-22. 
#XXXX,X!aa 4 2 
; X:pp represents a 6-bit absolute I/O address. Refer 
FXXXX,X'PP 4 2 | to V/O Short Address (Direct Addressing): <pp> 
H#XXXX,X:XXXX 6 3 BNL PEOE er: 
Timing: Refer to the preceding Instruction Fields table 
Memory: Refer to the preceding Instruction Fields table 
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ASL Arithmetic Shift Left ASL 


Operation: Assembler Syntax: 
(see following figure) ASL D (parallel move) 
C<+—|<+— <— < < 0 (parallel move) 
D2 D1 DO 


Description: Arithmetically shift the destination operand (D) | bit to the left and store the result in the destination 
accumulator. The MSB of the destination prior to the execution of the instruction is shifted into C, and 
a zero is shifted into the LSB of the destination. 


Implementation Note: 
When a 16-bit register is specified as the operand for ASL, this instruction is actually assembled as an 
LSL with the same register argument. 


Example: 
ASL A X:(R3)+N,Y0; multiply A by 2, update R3,Y0 
Before Execution After Execution 
A 0123 0123 4 0246 0246 
A2 Al AO A2 Al AO 
SR 0300 SR 0373 


Explanation of Example: 

Prior to execution, the 36-bit A accumulator contains the value $A:0123:0123. Execution of the 
ASL A instruction shifts the 36-bit value in the A accumulator | bit to the left and stores the result 
back in the A accumulator. C is set by the operation because bit 35 of A was set prior to the execution 
of the instruction. The V bit of CCR (bit 1) is also set because bit 35 of A has changed during the ex- 
ecution of the instruction. The U bit of CCR (bit 4) is set because the result is not normalized, the E bit 
of CCR (bit 5) is set because the signed integer portion of the result is in use, and the L bit of CCR (bit 
6) is set because an overflow has occurred. 
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ASL 


Arithmetic Shift Left ASL 


Condition Codes Affected: 


4 3 2 1 0 


ih 


lo |SZ} LJ} E 


UjIN/Z/VIc 


A<N2ZCHmre 


— Set according to the standard definition of the S bit (parallel move) 
— Set if limiting (parallel move) or overflow has occurred in result 
— Set if the signed integer portion of A or B result is in use 

— Set according to the standard definition of the U bit 

— Set if bit 35 of A or B result is set except during saturation 

— Setif A or B result equals zero 
— Set if bit 35 of A or B result is changed due to left shift 

— Set if bit 35 of A or B was set prior to the execution of the instruction 


See Section 3.6.5, “16-Bit Destinations,” on page 3-35 for cases with X0, YO, or Y1 as D. 
See Section 3.6.2, “36-Bit Destinations—CC Bit Set,” on page 3-34 and Section 3.6.4, “20-Bit Desti- 
nations—CC Bit Set,” on page 3-34 for the case when the CC bit is set. 


Instruction Fields: 


Operation Operands Cc W Comments 
ASL FDD 2 1 Arithmetic shift left entire register by 1 bit 
Data ALU Operation Parallel Memory Read or Write 
Operation Registers Memory Access Source or Destination 
ASL A X:(Rn)+ X0 
B X:(Rn)+N Y1 
YO 
Al 
B1 
A 
B 
Timing: 2 + mv oscillator clock cycles 


Memory: 


1 program word 
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ASLL Multi-Bit Arithmetic Left Shift ASLL 


Operation: Assembler Syntax: 
S1<<S2—> D (no parallel move) ASLL $1,S2,D (no parallel move) 


Description: Arithmetically shift the first 16-bit source operand (S1) to the left by the value contained in the lowest 
4 bits of the second source operand (S2) and store the result in the destination register. If the destination 
is a 36-bit accumulator, correctly sign extend into the extension register (A2 or B2), and place zero in 


the LSP (AO or BO). 
Example: 
ASLL Y1,X0,A 
Before Execution After Execution 
0 3456 3456 F AAAO 0000 
A2 Al AO A2 Al AO 
Y1 AAAA Y1 AAAA 
X0 0004 X0 0004 


Explanation of Example: 
Prior to execution, the Y1 register contains the value to be shifted ($AAAA) and the X0 register con- 
tains the amount by which to shift ($0004). The contents of the destination register are not important 
prior to execution because they have no effect on the calculated value. The ASLL instruction arithmet- 
ically shifts the value $AAAA four bits to the left and places the result in the destination register A. 
Since the destination is an accumulator, the extension word (A2) is filled with sign extension, and the 
LSP (AO) is set to zero. 


Condition Codes Affected: 


LF |) * * " - HW | 10} SZy Lb E;}/U|N/Zi]vjic 


N — Set if bit 35 of A or B result is set except during saturation 
Z — Setif A or B result equals zero 


Note: If the CC bit is set, N is undefined and Z is set if the LSBs 31-0 are zero. 
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ASLL Multi-Bit Arithmetic Left Shift ASLL 


Instruction Fields: 


Operation Operands Cc W Comments 


ASLL Y1,X0,FDD 2 1 Arithmetic shift left of the first operand by value 
Y0,X0,FDD specified in four LSBs of the second operand; 
Y1,Y0,FDD places result in FDD 

Y0,Y0,FDD 
A1,Y0,FDD 
B1,Y1,FDD 


Timing: 2 oscillator clock cycles 


Memory: 1 program word 
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ASR Arithmetic Shift Right ASR 


Operation: Assembler Syntax: 
(see following figure) ASR D (parallel move) 
-—>| —»> | —> —> ——> C (parallel move) 


|| D2 D1 DO 


Description: Arithmetically shift the destination operand (D) | bit to the right and store the result in the destination 
accumulator. The LSB of the destination prior to the execution of the instruction is shifted into C and 
the MSB of the destination is held constant. 


Example: 
ASR B X:(R2)+,Y0; divide B by 2, update R3, load R3 
Before Execution After Execution 
A A864 A865 D 5432 5432 
B2 B1 BO B2 B1 BO 
SR 0300 SR 0329 


Explanation of Example: 
Prior to execution, the 36-bit B accumulator contains the value $A:A864:A865. Execution of the 
ASR B instruction shifts the 36-bit value in the B accumulator | bit to the right and stores the result 
back in the B accumulator. C is set by the operation because bit 0 of A was set prior to the execution 
of the instruction. The N bit of CCR (bit 3) is also set because bit 35 of the result in A is set. The E bit 
of CCR (bit 5) is set because the signed integer portion of B is used by the result. 
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ASR 


Arithmetic Shift Right ASR 


Condition Codes Affected: 


"1 


lo |SZ} LJ} E 


UIN/Z/VIc 


A<N2ZCmre 


— Set according to the standard definition of the S bit (parallel move) 
— Set if data limiting has occurred during parallel move 

— Set if the signed integer portion of A or B result is in use 

— Set according to the standard definition of the U bit 

— Set if bit 35 of A or B result is set except during saturation 

— Setif A or B result equals zero 


— Always cleared 


— Set if bit 0 of A or B was set prior to the execution of the instruction 


See Section 3.6.5, “16-Bit Destinations,” on page 3-35 for cases with XO, YO, or Y1 as D. 
See Section 3.6.2, “36-Bit Destinations—CC Bit Set,” on page 3-34 and Section 3.6.4, “20-Bit Desti- 
nations—CC Bit Set,” on page 3-34 for the case when the CC bit is set. 


Instruction Fields: 


Operation Operands Cc W Comments 
ASR FDD 2 1 Arithmetic shift right entire register by 1 bit 
Data ALU Operation Parallel Memory Read or Write 
Operation Registers Memory Access Source or Destination 

ASR A X:(Rn)+ X0 

B X:(Rn)+N Y1 

YO 

Al 

B1 

A 

B 

Timing: 2 + mv oscillator clock cycles 


Memory: 


1 program word 
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ASRAC arithmetic Right Shift with Accumulae ASRAC 


Operation: 
S1>>S2+D-—D_ (no parallel move) ASRAC $1,S2,D (no parallel move) 


Assembler Syntax: 


Description: Arithmetically shift the first 16-bit source operand (S1) to the right by the value contained in the lowest 


4 bits of the second source operand (S2) and accumulate the result with the value in the destination 
register. If the destination is a 36-bit accumulator, correctly sign extend into the extension register (A2 
or B2). 


Usage: This instruction is typically used for multi-precision arithmetic right shifts. 
Example: 
ASRAC Y1,X0,A ; 16-bit add, update X1,X0,R0,R3 
Before Execution After Execution 
0 0000 0099 F FCOO 3099 
A2 Al AO A2 Al AO 
Y1 C003 Y1 C003 
X0 0004 X0 0004 


Explanation of Example: 


Prior to execution, the Y1 register contains the value to be shifted ($C003), the XO register contains 
the amount by which to shift ($0004), and the destination accumulator contains $0:0000:0099. The 
ASRAC instruction arithmetically shifts the value $C003 four bits to the right and accumulates this 
result with the value already in the destination register A. Since the destination is an accumulator, the 
extension word (A2) is filled with sign extension. 


Condition Codes Affected: 


A-44 


Rk - ‘ 2 = HW | 10} SZy Lb E}/U|N/Zi]vjic 


N — Set if bit 35 of A or B result is set except during saturation 
Z — Setif A or B result equals zero 


See Section 3.6.2, “36-Bit Destinations—CC Bit Set,” on page 3-34 and Section 3.6.4, “20-Bit Desti- 
nations—CC Bit Set,” on page 3-34 for the case when the CC bit is set. 
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ASRAC arithmetic Right Shift with Accumulae ASRAC 


Instruction Fields: 


Operation Operands Cc W Comments 
ASRAC Y1,X0,F 2 1 Arithmetic word shifting with accumulation 

YO,X0,F 
Y1,Y0,F 
YO,Y0,F 
A1,Y0,F 
B1,Y1,F 

Timing: 2 oscillator clock cycles 

Memory: 1 program word 
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ASRR Multi-Bit Arithmetic Right Shift ASRR 


Operation: Assembler Syntax: 
S1>>S2—> D (no parallel move) ASRR $1,S2,D (no parallel move) 


Description: Arithmetically shift the first 16-bit source operand (S1) to the right by the value contained in the lowest 
4 bits of the second source operand (S2) and store the result in the destination register. If the destination 
is a 36-bit accumulator, correctly sign extend into the extension register (A2 or B2), and place zero in 


the LSP (AO or BO). 
Example: 
ASRR Y1,X0,A ; vight shift of 16-bit Y1 by xX0 
Before Execution After Execution 
0 1234 5678 F FAAA 0000 
A2 Al AO A2 Al AO 
Y1 AAAA Y1 AAAA 
X0 0004 X0 0004 


Explanation of Example: 
Prior to execution, the Y1 register contains the value to be shifted ($AAAA) and the X0 register con- 
tains the amount by which to shift ($0004). The contents of the destination register are not important 
prior to execution because they have no effect on the calculated value. The ASRR instruction arithmet- 
ically shifts the value $AAAA four bits to the right and places the result in the destination register A. 
Since the destination is an accumulator, the extension word (A2) is filled with sign extension, and the 
LSP (AO) is set to zero. 


Condition Codes Affected: 


LF |) * * " - HW | 10} SZy Lb E;}/U|N/Zi]vjic 


N — Set if bit 35 of A or B result is set except during saturation 
Z — Setif A or B result equals zero 
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ASRR 


Instruction Fields: 


Multi-Bit Arithmetic Right Shift 


ASRR 


Operation 


Operands 


Cc Ww Comments 


ASRR 


Y1,X0,FDD 
Y0,X0,FDD 
Y1,Y0,FDD 
Y0,Y0,FDD 
A1,Y0,FDD 
B1,Y1,FDD 


specified in four LSBs of the second operand; 
places result in FDD 


2 1 Arithmetic shift right of the first operand by value 


Timing: 2 oscillator clock cycles 


Memory: 1 program word 
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Bcc 


Operation: 
If cc, then PC + label 


else PC + 1 


Bcc 


Branch Conditionally 


Assembler Syntax: 


— PC Bec <OFFSET7> 
—> PC 


Description: If the specified condition is true, program execution continues at location PC + displacement. The PC 


Example: 


contains the address of the next instruction. If the specified condition is false, the PC is incremented, 
and program execution continues sequentially. The offset is a 7-bit-sized value that is sign extended to 
16 bits. This instruction is more compact than the Jcc instruction, but can only be used to branch within 
a small address range 


The term “cc” specifies the following: 


“cc” Mnemonic Condition 
CC (HS*)— carry clear (higher or same) C=0 
CS (LO*)— carry set (lower) C=1 
EQ — equal Z=1 
GE — greater than or equal N @ V=0 
GT — greater than Z+(N  V)=0 
HI* — higher CeZ=1 
LE —less than or equal Z+(N @ V)=1 
LS* — lower or same C+Z=1 
LT —less than N ® V=1 
NE — not equal Z=0 
NN — not normalized Z+(UeE)=0 
NR — normalized Z+(U*E)=1 
* Only available when CC bit set in the OMR 
X denotes the logical complement of X 
+ denotes the logical OR operator 
¢ denotes the logical AND operator 
® denotes the logical exclusive OR operator 
BNE LABEL ; branch to label if Z condition clear 
INCW A 
INCW A 
LABEL 
ADD B,A 


Explanation of Example: 


Restrictions: 


A-48 


In this example, if the Z bit is zero when executing the BNE instruction, program execution skips the 
two INCW instructions and continues with the ADD instruction. If the specified condition is not true, 
no branch is taken, the program counter is incremented by one, and program execution continues with 
the first INCW instruction. The Bec instruction uses a PC-relative offset of two for this example. 


A Bcc instruction used within a DO loop cannot begin at the LA or LA-1 within that DO loop. 
A Bcc instruction cannot be repeated using the REP instruction. 
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Bcc 


Branch Conditionally 


Condition Codes Affected: 
The condition codes are tested but not modified by this instruction. 


Instruction Fields: 


Bcc 


Operation Operands Cc Comments 
Bcc <OFFSET7> 6/4 7-bit signed PC relative offset 
Timing: 4 + jx oscillator clock cycles 
Memory: 1 program word 
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BFCHG Test Bit Field and Change BFCHG 


Operation: Assembler Syntax: 
(<bit field> of destination) > (<bit field> of destination)BFCHG #ii11,X:<ea> 
(<bit field> of destination) — (<bit field> of destination)BFCHG #ii11,D 


Description: Test all selected bits of the destination operand. If all selected bits are set, C is set; otherwise, C is 
cleared. Then complement the selected bits and store the result in the destination memory location. The 
bits to be tested are selected by a 16-bit immediate value in which every bit set is to be tested and 
changed. This instruction performs a read-modify-write operation on the destination memory location 
or register and requires two destination accesses. 


Usage: This instruction is very useful in performing I/O and flag bit manipulation. 
Example: 
BFCHG #$0310, X:<<SFFE2 ;test and change bits 4, 8, and 9 
jin a peripheral register 
Before Execution After Execution 
X:$FFE2 0010 X:$FFE2 0300 
SR 0001 SR 0000 


Explanation of Example: 
Prior to execution, the 16-bit X memory location X:$FFE2 contains the value $0010. Execution of the 
instruction tests the state of the bits 4, 8, and 9 in X:$FFE2; does not set C (because all of the CCR bits 
were not set); and then complements the bits. 


Condition Codes Affected: 


LE |. J ‘i , ‘ if lo/sz}]L}E};UTN]i Z)Jviec 


For desunaton operand SR: 
— Changed if specified in the field 
For other a operands: 
L — Set if data limiting occurred during 36-bit source move 
C — Set if all bits specified by the mask are set 
Clear if not all bits specified by the mask are set 


Note: If all bits in the mask are set to zero, the destination is unchanged, and the C bit is set. 
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BFCHG 


Instruction Fields: 


Test Bit Field and Change BFCHG 


Operation 


Operands 


Comments 


BFCHG 


#xxxx, DDDDD 


#XXXX,X:(R2+Xx) 


#XXXX,X:(SP-xx) 


#XXXxX,X:aa 


#XXXX, X!Pp 


#XXXX,XIXXXX 


BFCHG tests all bits selected by the 16-bit immedi- 
ate mask. If all selected bits are set, then the C bit is 
set. Otherwise it is cleared. Then it inverts all 
selected bits. 


All registers in DDDDD are permitted except HWS. 


X:aa represents a 6-bit absolute address. Refer to 
Absolute Short Address (Direct Addressing): 
<aa> on page 4-22. 


X:pp represents a 6-bit absolute I/O address. Refer 
to I/O Short Address (Direct Addressing): <pp> 
on page 4-23. 


Timing: Refer to the preceding Instruction Fields table 


Memory: Refer to the preceding Instruction Fields table 
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B FC LR Test Bit Field and Clear B FC L R 


Operation: Assembler Syntax: 
0 —(<bit field> of destination) BFCLR #ii11,X:<ea> 
0 —(<bit field> of destination) BFCLR #iiii,D 


Description: Test all selected bits of the destination operand. If all selected bits are set, C is set; otherwise, C is 
cleared. Then clear the selected bits and store the result in the destination memory location. The bits 
to be tested are selected by a 16-bit immediate value in which every bit set is to be tested and cleared. 
This instruction performs a read-modify-write operation on the destination memory location or register 
and requires two destination accesses. 


Usage: This instruction is very useful in performing I/O and flag bit manipulation. 


Example: 


BFCLR #$0310, X:<<SFFE2 ; test and clear bits 4, 8, and 9 in 
; an on-chip peripheral register 


Before Execution After Execution 
X:$FFE2 7F95 X:$FFE2 7085 
SR 0001 SR 0000 


Explanation of Example: 
Prior to execution, the 16-bit X memory location X:$FFE2 contains the value $7F95. Execution of the 
instruction tests the state of the bits 4, 8, and 9 in X:$FFE2; clears C (because not all the CCR bits were 
clear); and then clears the bits. 


Condition Codes Affected: 


LF} * - . i 4} lo;sz}]/L]/e;uy;nizyJvje 


For desdnation operand SR: 
— Cleared as defined in the field and if specified in the field 
For other anata operands: 
L — Set if data limiting occurred during 36-bit source move 
C — Set if all bits specified by the mask are set 
Clear if not all bits specified by the mask are set 


Note: If all bits in the mask are set to zero, the destination is unchanged, and the C bit is set. 
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BFCLR 


Instruction Fields: 


Test Bit Field and Clear BFCLR 


Operation Operands Comments 
BFCLR #xxxx,DDDDD BFCLR tests all bits selected by the 16-bit immedi- 
ate mask. If all selected bits are set, then the C bit is 
#XXXX, X:(R2+Xx) set. Otherwise it is cleared. Then it clears all 
selected bits. 
#XXXX,X:(SP-xx) 
All registers in DDDDD are permitted except HWS. 
#XXXX,X!aa 
; X:aa represents a 6-bit absolute address. Refer to 
FXXXX,X'PP Absolute Short Address (Direct Addressing): 
WK Ses <aa> on page 4-22. 
X:pp represents a 6-bit absolute I/O address. Refer 
to I/O Short Address (Direct Addressing): <pp> 
on page 4-23. 
Timing: Refer to the preceding Instruction Fields table 
Memory: Refer to the preceding Instruction Fields table 
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BFSET Test Bit Field and Set BFSET 


Operation: Assembler Syntax: 
1 > (<bit field> of destination) BFSET #ii11,X:<ea> 
1 > (<bit field> of destination) BFSET #iiii,D 


Description: Test all selected bits of the destination operand. If all selected bits are set, C is set; otherwise, C is 
cleared. Then set the selected bits, and store the result in the destination memory location. The bits to 
be tested are selected by a 16-bit immediate value in which every bit set is to be tested and set. This 
instruction performs a read-modify-write operation on the destination memory location or register and 
requires two destination accesses. 


Usage: This instruction is very useful in performing I/O and flag bit manipulation. 
Example: 
BFSET #SF400, X:<<SFFE2 
Before Execution After Execution 
X:$FFE2 8921 X:$FFE2 FD21 
SR 0000 SR 0000 


Explanation of Example: 
Prior to execution, the 16-bit X memory location X:$FFE2 contains the value $8921. Execution of the 
instruction tests the state of bits 10, 12, 13, 14, and 15 in X:$FFE2; does not set C (because all the CCR 
bits were not set); and then sets the bits. 


Condition Codes Affected: 


LF} * * ‘ 7 id 4} lo;sz}]L]/e;uynizyvje 


For destination operand SR: 
— Set as defined in the field and if specified in the field 
For other destination operands: 
— Set if data limiting occurred during 36-bit source move 
C — Set if all bits specified by the mask are set 
Clear if not all bits specified by the mask are set 


Note: If all bits in the mask are set to zero, the destination is unchanged, and the C bit is set. 
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BFSET 


Instruction Fields: 


Test Bit Field and Set BFSET 


Operation Operands Cc W Comments 
BFSET #xxxx,DDDDD 4 2 BFSET tests all bits selected by the 16-bit immedi- 
ate mask. If all selected bits are clear, then the C bit 
#XXXX, X:(R2+Xx) 6 2 is set. Otherwise it is cleared. Then it sets all 
selected bits. 
#XXXX,X:(SP-xx) 6 2 
All registers in DDDDD are permitted except HWS. 
#XXXX,X!aa 4 2 
; X:aa represents a 6-bit absolute address. Refer to 
FXXXX,X'PP 4 2 | Absolute Short Address (Direct Addressing): 
F#EXXXX,X!XXXX 6 3 de> On page ae: 
X:pp represents a 6-bit absolute I/O address. Refer 
to I/O Short Address (Direct Addressing): <pp> 
on page 4-23. 
Timing: Refer to the preceding Instruction Fields table 
Memory: Refer to the preceding Instruction Fields table 
@ vororora Instruction Set Details A-55 


BFISTH Test Bit Field High BFISTH 


Operation: Assembler Syntax: 
Test <bit field> of destination for ones BFTSTH #1i11,X:<ea> 
Test <bit field> of destination for ones BFTSTH #ii11,D 


Description: Test all selected bits of the destination operand. If all selected bits are set, C is set; otherwise, C is 
cleared. The bits to be tested are selected by a 16-bit immediate value in which every bit set is to be 
tested. This instruction performs two destination accesses. 


Usage: This instruction is very useful for testing I/O and flag bits. 
Example: 
BFTSTH #$0310, X:<<SFFE2 ; test high bits 4, 8, and 9 in 
; an on-chip peripheral register 
Before Execution After Execution 
X:$FFE2 OFFO X:$FFE2 OFFO 
SR 0000 SR 0001 


Explanation of Example: 
Prior to execution, the 16-bit X memory location X:$FFE2 contains the value $0FFO. Execution of the 
instruction tests the state of bits 4, 8, and 9 in X:$FFE2 and sets C (because all the CCR bits were set). 


Condition Codes Affected: 


= 
ol 
= 
K 
an 
oo 
= 
ie) 
= 
ar 
= 
f=) 
oOo 
foo) 
NI 
fop) 
oa 
aK 
[2] 
ine) 
ei 
{—) 


Lalli 7 7 * * 4} lo;sz}]/L]/e;us;nizyvqje 


L — Set if data limiting occurred during 36-bit source move 
C — Set if all bits specified by the mask are set 
Clear if not all bits specified by the mask are set 


Note: If all bits in the mask are set to zero, the destination is unchanged, and the C bit is set. 
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BFISTH 


Instruction Fields: 


Test Bit Field High BFISTH 


Operation Operands Comments 
BFTSTH #xxxx, DDDDD BFTSTH tests all bits selected by the 16-bit immedi- 
ate mask. If all selected bits are set, then the C bit is 
#XXXX,Xi(R2+XX) set. Otherwise it is cleared. 
#Xxxx,X:(SP-xx) All registers in DDDDD are permitted except HWS. 
#XXXx,X:aa X:aa represents a 6-bit absolute address. Refer to 
; Absolute Short Address (Direct Addressing): 
XXX, X:pp <aa> on page 4-22. 
PK ee X:pp represents a 6-bit absolute I/O address. Refer 
to I/O Short Address (Direct Addressing): <pp> 
on page 4-23. 
Timing: Refer to the preceding Instruction Fields table 
Memory: Refer to the preceding Instruction Fields table 
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BFISTL Test Bit Field Low BFISTL 


Operation: Assembler Syntax: 
Test <bit field> of destination for zeros BFTSTL #ii11,X:<ea> 
Test <bit field> of destination for zeros BFTSTL #i111,D 


Description: Test all selected bits of the destination operand. If all selected bits are clear, C is set; otherwise, C is 
cleared. The bits to be tested are selected by a 16-bit immediate value in which every bit set is to be 
tested. This instruction performs two destination accesses. 


Usage: This instruction is very useful for testing I/O and flag bits. 
Example: 
BFTSTL #$0310, X:<<SFFE2 ; test low bits 4, 8, and 9 in 
; an on-chip peripheral register 
Before Execution After Execution 
X:$FFE2 18EC X:$FFE2 18EC 
SR 0000 SR 0001 


Explanation of Example: 
Prior to execution, the 16-bit X memory location X:$FFE2 contains the value $18EC. Execution of the 
instruction tests the state of bits 4, 8, and 9 in X:$FFE2 and sets C (because all the CCR bits were 
cleared). 


Condition Codes Affected: 


LF} * = * " = i lo/sz}L}]E};UTN]i ZJviec 


L — Set if data limiting occurred during 36-bit source move 
C — Set if all bits specified by the mask are cleared 
Clear if not all bits specified by the mask are cleared 


Note: If all bits in the mask are set to zero, the destination is unchanged, and the C bit is set. 


Instruction Fields: 


Operation Operands Cc W Comments 
BFTSTL #xxxx,DDDDD 4 2 BFTSTL tests all bits selected by the 16-bit immedi- 
ate mask. If all selected bits are clear, then the C bit 
#XXXX,X:(R2+XX) 6 2 is set. Otherwise it is cleared. 
All registers in DDDDD are permitted except HWS. 
#XXXX,X:(SP-xx) 6 2 
X:aa represents a 6-bit absolute address. Refer to 
#XXXX, Xa a 2 | Absolute Short Address (Direct Addressing): 
; <aa> on page 4-22. 
XXX, X:pp 4 2 X:pp represents a 6-bit absolute I/O address. Refer 
Perey eee 6 3 to I/O Short Address (Direct Addressing): <pp> 
on page 4-23. 
Timing: Refer to the preceding Instruction Fields table 
Memory: Refer to the preceding Instruction Fields table 
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BRA 


Operation: 
PC+label + PC 


Branch 


Assembler Syntax: 
<OFFSET7> 


BRA 


BRA 


Description: Branch to the location in program memory at PC + displacement. The PC contains the address of the 
next instruction. The displacement is a 7-bit signed value that is sign extended to form the PC-relative 


offset. 


Example: 


LABEL 


Explanation of Example: 


In this example, program execution skips the two INCW instructions and continues with the ADD in- 
struction. The BRA instruction uses a PC-relative offset of two for this example. 


Condition Codes Affected: 
The condition codes are not affected by this instruction. 


Restrictions: 


A BRA instruction used within a DO loop cannot begin at the LA or LA-1 within that DO loop. 


A BRA instruction cannot be repeated using the REP instruction. 


Instruction Fields: 


Operation Operands Cc W Comments 
BRA <OFFSET7> 6 1 7-bit signed PC relative offset 
Timing: 6+)x oscillator clock cycles 
Memory: 1 program word 
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BRCLR Branch if Bits Clear BRCLR 


Operation: Assembler Syntax: 
Branch if <bit field> of destination is all zeros BRCLR #i111,X:<ea>,aa 
Branch if <bit field> of destination is all zeros BRCLR #i111,D,aa 


Description: Test all selected bits of the destination operand. If all the selected bits are clear, C is set, and program 
execution continues at the location in program memory at PC + displacement. Otherwise, C is cleared 
and execution continues with the next sequential instruction. The bits to be tested are selected by an 
8-bit immediate value in which every bit set is to be tested. 


Usage: This instruction is useful in performing I/O flag polling. 
Example: 
BRCLR #$0013,X:<<SFFE2, LABEL 
INCW A 
INCW A 
LABEL 
ADD B,A 
Before Execution After Execution 
X:$FFE2 18EC X:$FFE2 18EC 
SR 0000 SR 0001 


Explanation of Example: 
Prior to execution, the 16-bit X memory location X:$FFE2 contains the value $18EC. Execution of the 


instruction tests the state of bits 4, 1, and 0 in X:$FFE2 and sets C (because all the CCR bits were clear). 
Since C is set, program execution is transferred to the address offset from the current program counter 
by the displacement specified in the instruction, (the two INCW instructions are not executed). 


Condition Codes Affected: 


LF} * 7 x * "1 lo/sz|}L}E;UTN]i Z)Jvic 


L — Set if data limiting occurred during 36-bit source move 
C — Set if all bits specified by the mask are cleared 
Clear if not all bits specified by the mask are cleared 


Note: If all bits in the mask are set to zero, C is set, and the branch is taken. 
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BRCLR 


Instruction Fields: 


Branch if Bits Clear BRCLR 


Operation Operands Cc W Comments 
BRCLR #MASK8,DDDDD,AA 10/8 2 BRCLR tests all bits selected by the immediate 
mask. If all selected bits are clear, then the carry 
#MASK8,X:(R2+Xx),AA 12/10 2 bit is set and a PC relative branch occurs. Other- 
wise it is cleared and no branch occurs. 
#MASK8,X:(SP-xx),AA 12/10 2 
All registers in DDDDD are permitted except 
#MASK8,X:aa,AA 10/8 2 | HWS. 
#MASK8,X:pp,AA 10/8 2 | MASK8 specifies a 16-bit immediate value where 
#MASK8.Xxxxx,AA 12/10 3 either the upper or lower 8 bits contains all zeros. 
AA specifies a 7-bit PC relative offset. 
X:aa represents a 6-bit absolute address. Refer 
to Absolute Short Address (Direct Address- 
ing): <aa> on page 4-22. 
X:pp represents a 6-bit absolute I/O address. 
Refer to I/O Short Address (Direct Address- 
ing): <pp> on page 4-23. 
Timing: Refer to the preceding Instruction Fields table 
Memory: Refer to the preceding Instruction Fields table 
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BRS ET Branch if Bits Set BRS ET 


Operation: Assembler Syntax: 
Branch if <bit field> of destination is all ones BRSET #ilii,X:<ea>,aa 
Branch if <bit field> of destination is all ones BRSET #i111,D,aa 


Description: Test all selected bits of the destination operand. If all the selected bits are set, C is set, and program 
execution continues at the location in program memory at PC + displacement. Otherwise, C is cleared, 
and execution continues with the next sequential instruction. The bits to be tested are selected by an 
8-bit immediate value in which every bit set is to be tested. 


Usage: This instruction is useful in performing I/O flag polling. 
Example: 
BRSET #SOOFO,X:<<SFFE2, LABEL 
INCW A 
INCW A 
LABEL 
ADD B,A 
Before Execution After Execution 
X:$FFE2 OFFO X:$FFE2 OFFO 
SR 0000 SR 0001 


Explanation of Example: 
Prior to execution, the 16-bit X memory location X:$FFE2 contains the value $0FFO. Execution of the 


instruction tests the state of bits 4, 5, 6, and 7 in X:$FFE2 and sets C (because all the CCR bits were 
set). Since C is set, program execution is transferred to the address offset from the current program 
counter by the displacement specified in the instruction, (the two INCW instructions are not executed) 


Condition Codes Affected: 


LF} * 7 x * 4} lo;sz}]/L}/e;us;nizyvje 


L — Set if data limiting occurred during 36-bit source move 
C — Set if all bits specified by the mask are set 
Clear if not all bits specified by the mask are set 


Note: If all bits in the mask are set to zero, C is set and the branch is taken. 
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BRSET 


Instruction Fields: 


Branch if Bits Set B RS ET 


Operation Operands Cc W Comments 
BRSET #MASK8,DDDDD,AA 10/8 2 BRSET tests all bits selected by the immediate 
mask. If all selected bits are set, then the carry bit 
#MASK8,X:(R2+Xx),AA 12/10 2 is set and a PC relative branch occurs. Otherwise 
it is cleared and no branch occurs. 
#MASK8,X:(SP-xx), AA 12/10 2 
All registers in DDDDD are permitted except 
#MASK8,X:aa,AA 10/8 2 | HWS. 
#MASK8,X:pp,AA 10/8 2 | MASKS specifies a 16-bit immediate value where 
#MASK8,Xxxxx,AA 12/10 3 either the upper or lower 8 bits contains all zeros. 
AA specifies a 7-bit PC relative offset. 
X:aa represents a 6-bit absolute address. Refer to 
Absolute Short Address (Direct Addressing): 
<aa> on page 4-22. 
X:pp represents a 6-bit absolute I/O address. 
Refer to I/O Short Address (Direct Addressing): 
<pp> on page 4-23. 
Timing: Refer to the preceding Instruction Fields table 
Memory: Refer to the preceding Instruction Fields table 
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C LR Clear Accumulator C L R 


Operation: Assembler Syntax: 
0 >-D (parallel move) CLR D (parallel move) 


Description: Clear the destination register. 
Implementation Note: 


When a 16-bit register is used as the operand for CLR, this instruction is actually assembled as a 
MOVE #0,<register> instruction. It will disassemble as MOVE. 


Example: 
CLR A A,X: (RO) + ; save A into X data memory before 
clearing it 
A Before Execution A After Execution 
2 3456 789A 0 0000 0000 
A2 Al AO A2 Al AO 


Explanation of Example: 
Prior to execution, the 36-bit A accumulator contains the value $2:3456:789A. Execution of the 
CLR A instruction clears the 36-bit A accumulator to zero. 


Condition Codes Affected: 


LF) * |] *)* | * | * |] t0/SZ/L]/E;]UJN{|Z/Vic 


SZ — Set according to the standard definition of the SZ bit (parallel move) 
L — Set if data limiting has occurred during parallel move 

E — Always cleared if destination is a 36-bit accumulator 

U — Always set if destination is a 36-bit accumulator 

N — Always cleared if destination is a 36-bit accumulator 

Z — Always set if destination is a 36-bit accumulator 

V — Always cleared if destination is a 36-bit accumulator 

Note: The condition codes are only affected if the destination of the CLR instruction is one of the two 36-bit 


accumulators (A or B). 
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C LR Clear Accumulator C L R 


Instruction Fields: 


Operation Operands Cc W Comments 
CLR F 2 1 Clear 36-bit accumulator and set condition codes. 
F1iDD 2 1 Identical to move #0,<reg>; does not set condition 
codes. 
Rj 
N 
Data ALU Operation Parallel Memory Read or Write 
Operation Registers Memory Access Source or Destination 
CLR A X:(Rn)+ X0 
B X:(Rn)+N Y1 
YO 
A 
B 
Al 
B1 
Timing: 2 + mv oscillator clock cycles 
Memory: 1 program word 
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CMP Compare CMP 


Operation: Assembler Syntax: 
D-S (parallel move) CMP S,D (parallel move) 


Description: Subtract the two operands and update the CCR. The result of the subtraction operation is not stored. 
Usage: This instruction can be used for both integer and fractional two’s-complement data. 


Note: This instruction subtracts 36-bit operands. When a word is specified as the source, it is sign extended 
and zero filled to form a valid 36-bit operand. In order for C to be set correctly as a result of the sub- 
traction, the destination must be properly sign extended. The destination can be improperly sign ex- 
tended by writing Al or B1 explicitly prior to executing the compare, so that A2 or B2, respectively, 
may not represent the correct sign extension. This note particularly applies to the case in which the 
source is extended to compare 16-bit operands, such as XO with Al. 


Example: 
CMP YO,A X0,X: (R1)+N ; compare YO and A, save XO, 
7 update R1 

Before Execution After Execution 
0 0020 0000 0 0020 0000 
A2 Al AO A2 Al AO 
YO 0024 YO 0024 
SR 0300 SR 0319 


Explanation of Example: 
Prior to execution, the 36-bit A accumulator contains the value $0:0020:0000, and the 16-bit YO reg- 
ister contains the value $0024. Execution of the CMP Y0O,A instruction automatically appends the 
16-bit value in the YO register with 16 LS zeros, sign extends the resulting 32-bit long word to 36 bits, 
subtracts the result from the 36-bit A accumulator, and updates the CCR (leaving the A accumulator 
unchanged). 
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CMP 


Condition Codes Affected: 


Compare CM P 


9 8 7 6 5 4 3 2 1 0 


LF 


1} lo /SZ|}|L)/E;UIN]|Z)/ViCc 


O<NZCmrY 
| 


— Set according to the standard definition of the SZ bit (parallel move) 
— Set if limiting (parallel move) or overflow has occurred in result 
— Set if the signed integer portion of the result is in use 


Set if result is not normalized 


— Set if bit 35 of the result is set except during saturation 
— Set if result equals zero 

— Set if overflow has occurred in result 

— Set if a carry (or borrow) occurs from bit 35 of the result 


See Section 3.6.5, “16-Bit Destinations,” on page 3-35 for cases with X0, YO, or Y1 as D. 
See Section 3.6.2, “36-Bit Destinations—CC Bit Set,” on page 3-34 and Section 3.6.4, “20-Bit Desti- 
nations—CC Bit Set,” on page 3-34 for the case when the CC bit is set. 


Instruction Fields: 


Operation Operands Cc W Comments 
CMP DD,FDD 2 1 36-bit compare of two accumulators or data reg 
F1,DD 
~F,F 
X:(SP-xx), FDD 6 1 Compare memory word with 36 bit accumulator. 
X:aa,FDD 4 1 X:aa represents a 6-bit absolute address. Refer to 
Absolute Short Address (Direct Addressing): 
X:xxxx,FDD 6 2 | <aa> on page 4-22. 
Note: Condition codes set based on 36-bit result 
#xx,FDD 4 1 Compare acc with an immediate integer 0-31 
#xxxx,FDD 6 2 Compare acc with a signed 16-bit immediate 
Data ALU Operation Parallel Memory Read or Write 
Operation Registers Memory Access Source or Destination 
CMP X0,F X:(Rn)+ XO 
Y1,F X:(Rn)+N Y1 
YO,F YO 
A 
A,B B 
B,A Al 
B1 
(F = Aor B) 
Timing: Refer to the preceding Instruction Fields table 
Memory: Refer to the preceding Instruction Fields table 
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DEBUG 


Operation: 


Enter the debug processing state 


Enter Debug Mode 


Assembler Syntax: 


DEBUG 


DEBUG 


Description: Enter the debug processing state if the PWD bit is clear in the OnCE port’s OCR register, and wait for 
OnCE commands. If this bit is not clear, then the processor simply executes two NOPs and continues 
program execution. 


Condition Codes Affected: 


No condition codes are affected. 


Instruction Fields: 


Operation Operands Cc W Comments 
DEBUG 4 1 Generate a debug event 
Timing: 4 oscillator clock cycles 
Memory: 1 program word 
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D EC(W) Decrement Word D EC(W) 


Operation: Assembler Syntax: 
D2:D1-1 > D2:D1 (parallel move) DECW D (parallel move) 


Description: Decrement a 16-bit destination or the two upper portions (A2:A1 or B2:B1) of a 36-bit accumulator. 
If the destination is a 36-bit accumulator, leave the LSP (AO or BO) unchanged. 


Usage: This instruction is typically used when processing integer data. 
Example: 
DECW A X:(R2)+,X0 ; Decrement the 20 MSBs of A and then 
; update R2,X0 
A Before Execution A After Execution 
0 0001 0033 0 0000 0033 
A2 Al AO A2 Al AO 


Explanation of Example: 
Prior to execution, the 36-bit A accumulator contains the value $0:0001:0033. Execution of the 
DECW A instruction decrements by one the upper 20 bits of the A accumulator. 


Condition Codes Affected: 


15 14 13 12 11 10 9 8|7 6 5 4 3 2 1 0 
LF) * |] * 7) * | * | * |} 10 /SZ}L)E;UINIZ)|Vic 


— Set according to the standard definition of the SZ bit (parallel move) 
— Set if limiting (parallel move) or overflow has occurred in result 

— Set if the signed integer portion of the result is in use 

Set if result is unnormalized 

— Set if bit 35 of the result is set except during saturation 

— Set if the 20 MSBs of the result are all zeros 

— Set if overflow has occurred in result 

— Set if a carry (or borrow) occurs from bit 35 of the result 


O<NZCmry 
| 


See Section 3.6.5, “16-Bit Destinations,” on page 3-35 for cases with X0, YO, or Y1 as D. 
See Section 3.6.2, “36-Bit Destinations—CC Bit Set,” on page 3-34 and Section 3.6.4, “20-Bit Desti- 
nations—CC Bit Set,” on page 3-34 for the case when the CC bit is set. 


Instruction Fields: 


Operation Operands Cc W Comments 
DEC(W) FDD 2 1 Decrement word 
X:(SP-xx) 8 1 Decrement word in memory using appropriate 


addressing mode. 


X:aa 6 1 


X:aa represents a 6-bit absolute address. Refer to 
XIXXXX 8 2 | Absolute Short Address (Direct Addressing): 
<aa> on page 4-22. 
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DEC(W) 


Timing: 


Memory: 


A-70 


Decrement Word 


DEC(W) 


Data ALU Operation 


Parallel Memory Read or Write 


Operation 


Registers 


Memory Access Source or Destination 


DEC(W) 


A 
B 


X:(Rn)+ X0 
X:(Rn)+N Y1 


Refer to the preceding Instruction Fields table 


Refer to the preceding Instruction Fields table 
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DIV Divide Iteration DIV 


Operation: Assembler Syntax: 
(see following figure) DIV S,D (no parallel move) 


If D[35] © S[15] = 1 


Then 
<—| <— <— <—C; D1+S—> Dt 
D2 D1 DO 
Else 
«| <«<— e— |a—=6. pics —Di 
D2 D1 DO 


Description: This instruction is a divide iteration used to calculate | bit of the result of a division. After the correct 
number of iterations, this will divide the destination operand (D)—dividend or numerator—by the 
source operand (S)—divisor or denominator—and store the result in the destination accumulator. The 
32-bit dividend must be a positive value that is correctly sign extended to 36 bits and is stored in the 
full 36-bit destination accumulator. The 16-bit divisor is a signed value and is stored in the source op- 
erand. (Division of signed numbers is handled using the techniques in Section 8.4, “Division,” on page 
8-13.) This instruction can be used for both integer and fractional division. Each DIV iteration calcu- 
lates one quotient bit using a non-restoring division algorithm (see the description that follows). After 
execution of the first DIV instruction, the destination operand holds both the partial remainder and the 
formed quotient. The partial remainder occupies the high-order portion of the destination accumulator 
D and is a signed fraction. The formed quotient occupies the low-order portion of the destination ac- 
cumulator D (AO or BO) and is a positive fraction. One bit of the formed quotient is shifted into the 
LSB of the destination accumulator at the start of each DIV iteration. The formed quotient is the true 
quotient if the true quotient is positive. If the true quotient is negative, the formed quotient must be 
negated. For fractional division, valid results are obtained only when |D| < |S|. This condition ensures 
that the magnitude of the quotient is less than one (is fractional) and precludes division by zero. 


The DIV instruction calculates one quotient bit based on the divisor and the previous partial remainder. 
To produce an N-bit quotient, the DIV instruction is executed N times, where N is the number of bits 
of precision desired in the quotient (1 < N < 16). Thus, for a full precision (16-bit) quotient, 16 DIV 
iterations are required. In general, executing the DIV instruction N times produces an N-bit quotient 
and a 32-bit remainder, which has (32 - N) bits of precision and whose N MSBs are zeros. The partial 
remainder is not a true remainder and must be corrected (due to the non-restoring nature of the division 
algorithm) before it may be used. Therefore, once the divide is complete, it is necessary to reverse the 
last DIV operation and restore the remainder to obtain the true remainder. The DIV instruction uses a 
non-restoring division algorithm that consists of the following operations: 


1) Compare the source and destination operand sign bits. An exclusive OR operation is performed on 
bit 35 of the destination operand and bit 15 of the source operand. 

2) Shift the partial remainder and the quotient. The 36-bit destination accumulator is shifted | bit to 
the left. C is moved into the LSB (bit 0) of the accumulator. 

3) Calculate the next quotient bit and the new partial remainder. The 16-bit source operand (signed di- 
visor) is either added to or subtracted from the MSP of the destination accumulator (A1 or B1), and the 
result is stored back into the MSP of the destination accumulator. If the result of the exclusive OR op- 
eration described previously was one (that is, the sign bits were different), the source operand S is add- 
ed to the accumulator. If the result of the exclusive OR operation was zero (that is, the sign bits were 
the same), the source operand S is subtracted from the accumulator. Due to the automatic sign exten- 
sion of the 16-bit signed divisor, the addition or subtraction operation correctly sets C with the next 
quotient bit. 
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DIV Divide Iteration DIV 


Explanation of Example: 
The DIV iteration instruction can be used in one of several different division algorithms, depending on 
the needs of an application. Section 8.4, “Division,” on page 8-13 shows the correct usage of this in- 
struction for fractional and integer division routines, discusses in detail issues related to division, and 
provides several examples. The division routine is greatly simplified if both operands are positive, or 
if it is not necessary to also calculate a remainder. 


Condition Codes Affected: 


LF} * * * : 7 1 lo/sz|}L}]E;UTN]i Zz)JViCc 


L — Set if overflow bit V is set 

V — Set if the MSB of the destination operand is changed as a result of the 
instruction’s left shift operation 

C — Set if bit 35 of the result is cleared 


Instruction Fields: 


Operation Operands Cc W Comments 
DIV DD,F 2 1 Divide iteration 
Timing: 2 oscillator clock cycles 
Memory: 1 program word 
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DO Start Hardware Do Loop 


Operation upon Executing DO Instruction: Assembler Syntax: 
HWS[0] — HWS[1]; #xx > LC DO #XX,eXpr 
PC > HWS[0]; LF—- NL; expr > LA 

15 LF 

HWS[0] — HWS[1]; S—LC DO S,expr 

PC > HWS[0]; LF—- NL; expr >LA 

15 LF 


Operation When Loop Completes (End-of-Loop Processing): 
NL —-> LF 
HWS[1] — HWS[0]; 0 > NL 


DO 


Description: Begin a hardware DO loop that is to be repeated the number of times specified in the instruction’s 


source operand, and whose range of execution is terminated by the destination operand (shown previ- 
ously as “expr’’). No overhead other than the execution of this DO instruction is required to set up this 
loop. DO loops can receive their loop count as an immediate value or as a variable stored in an on-chip 
register. When executing a DO loop, the instructions are actually fetched each time through the loop. 
Therefore, a DO loop can be interrupted. 


During the first instruction cycle, the DO instruction’s source operand is loaded into the 13-bit LC reg- 
ister, and the second location in the HWS receives the contents of the first location. The LC register 
stores the remaining number of times the DO loop will be executed and can be accessed from inside 
the DO loop as a loop count variable subject to certain restrictions. The DO instruction allows all reg- 
isters on the DSP core to specify the number of loop iterations, except for the following: M01, HWS, 
OMR, and SR. If immediate short data is instead used to specify the loop count, the 6 LSBs of the LC 
register are loaded from the instruction, and the upper 7 MSBs are cleared. 


During the second instruction cycle, the current contents of the PC are pushed onto the HWS. The DO 
instruction’s destination address (shown as “expr’’) is then loaded into the LA register. This 16-bit op- 
erand is located in the instruction’s 16-bit absolute address extension word (as shown in the opcode 
section). The value in the PC pushed onto the HWS is the address of the first instruction following the 
DO instruction (that is, the first actual instruction in the DO loop). At the bottom of the loop, when it 
is necessary to return to the top for another loop pass, this value is read (that is, copied but not pulled) 
from the top of the HWS and loaded into the PC. 


During the third instruction cycle, the LF is set. The PC is repeatedly compared with LA to determine 
if the last instruction in the loop has been fetched. If LA equals PC, the last instruction in the loop has 
been fetched and the LC is tested. If LC is not equal to one, it is decremented by one, and SSH is loaded 
into the PC to fetch the first instruction in the loop again. If LC equals one, the end-of-loop processing 
begins. 


During the end-of-loop processing, the NL bit is written into the LF, and the NL bit is cleared. The 
contents of the second HWS location are written into the first HWS location. Instruction fetches now 
continue at the address of the instruction that follows the last instruction in the DO loop. 


DO loops can also be nested as shown in Section 8.6, “Loops,” on page 8-20. When DO loops are nest- 
ed, the end-of-loop addresses must also be nested and are not allowed to be equal. The assembler gen- 
erates an error message when DO loops are improperly nested. 
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DO 


Note: 


Note: 


Note: 


Note: 


Note: 


Start Hardware Do Loop DO 


The assembler calculates the end-of-loop address to be loaded into LA by evaluating the end-of-loop 
“expr” and subtracting one. This is done to accommodate the case in which the last word in the DO 
loop is a two-word instruction. Thus, the end-of-loop expression “expr” in the source code must rep- 
resent the address of the instruction after the last instruction in the loop. 


The LF is cleared by a hardware reset. 


Due to pipelining, if an address register (RO—-R3, SP, or MO1) is changed using a move-type instruction 
(LEA, Tcc, MOVE, MOVEC, MOVEP, or parallel move), the new contents of the destination address 
register will not be available for use during the following instruction (that is, there is a single instruc- 
tion cycle pipeline delay). This restriction also applies to the situation in which the last instruction in 
a DO loop changes an address register and the first instruction at the top of the DO loop uses that same 
address register. The top instruction becomes the following instruction due to the loop construct. 


If the A or B accumulator is specified as a source operand, and the data from the accumulator indicates 
that extension is used, the value to be loaded into the LC register will be limited to a 16-bit maximum 
positive or negative saturation constant. If positive saturation occurs, the limiter places $7FFF onto the 
bus, and the lower 13 bits of this value are all ones. The thirteen ones are loaded into the LC register 
as the maximum unsigned positive loop count allows. If negative saturation occurs, the limiter places 
$8000 onto the bus, and the lower 13 bits of this value are all zeros. The thirteen zeros are loaded into 
the LC register, specifying a loop count of zero. The A and B accumulators remain unchanged. 


If LC is zero upon entering the DO loop, the loop is executed 2!3 times. To avoid this, use the software 
technique outlined in Section 8.6, “Loops,” on page 8-20. 


Condition Codes Affected: 


A-74 


15 14 13 12 #11 #10 9 8 7 6 5 4 3 2 1 0 


LF | * " = ? * 1 | lo}sz|]LJ/E}UJNIZi|vic 


LF — _ Set when a DO loop is in progress 
L — _ Set if data limiting occurred 
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DO Start Hardware Do Loop DO 


Restrictions: 
The end-of-loop comparison previously described occurs at instruction fetch time. That is, LA is com- 
pared with PC when the instruction at the LA-2 is being executed. Therefore, instructions that access 
the program controller registers or change program flow cannot be used in locations LA-2, LA-1, or 
A. 


Proper DO loop operation is not guaranteed if an instruction starting at the LA-2, LA-1, or LA specifies 
one of the program controller registers SR, SP, LA, LC, or (implicitly) PC as a destination register. 
Similarly, the HWS register may not be specified as a source or destination register in an instruction 
starting at the LA-2, LA-1, or LA. Additionally, the HWS register cannot be specified as a source reg- 
ister in the DO instruction itself, and LA cannot be used as a target for jumps to subroutine (that is, JSR 
to LA). A DO instruction cannot be repeated using the REP instruction. 


The following instructions cannot begin at the indicated position(s) near the end of a DO loop: 


At the LA-2, LA-1, and LA: 
DO 
MOVEC from HWS 
MOVEC to LA, LC, SR, SP, or HWS 
Any bit-field instruction on the Status Register (SR) 
Two-word instructions that read LC, SP, or HWS 


At the LA-1: 
ENDDO 
Single-word instructions that read LC, SP, or HWS 


At the LA: 
Any two-word instruction (this restriction applies to the situation in which the DSP 
simulator’s single-line assembler is used to change the last instruction in a DO loop from 
a one-word instruction to a two-word instruction) 


Bec, Jcc BRSET, BRCLR 
BRA, JMP_ REP 

JSR RTI, RTS 
WAIT, STOP 


Similarly, since the DO instruction accesses the program controller registers, the DO instruction must 
not be immediately preceded by any of the following instructions: 


Immediately Before DO: 
MOVEC to HWS 
MOVEC from HWS 


Other Restrictions: 
DO HWS,xxxx 
JSR to (LA) whenever the LF is set 
A DO instruction cannot be repeated using the REP instruction 
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DO Start Hardware Do Loop DO 


Example: 
DO #cntl, END ; begin DO loop 
MOVE X:(RO),A 
REP #cnt2 ; nested REP loop 
ASL A ; repeat this instruction 
MOVE A,X: (RO)+ ; last instruction in DO loop 
END $ ; (outside DO loop) 


Explanation of Example: 
This example illustrates a DO loop with a REP loop nested within the DO loop. In this example, “cnt1” 
values are fetched from memory; each is left shifted by “cnt2” counts and is stored back in memory. 
The DO loop executes “cnt!” times while the ASL instruction inside the REP loop executes (“cnt1” * 
“cnt2”) times. The END label is located at the first instruction past the end of the DO loop, as men- 
tioned previously. 


Instruction Fields: 


Operation Operands Cc W Comments 


DO #XX,XXXX 6 2 Load LC register with unsigned value and start 
hardware DO loop with 6-bit immediate loop count. 
The last address is 16-bit absolute. #xx = 0 not 
allowed by assembler. 


DDDDD,xxxx 6 2 Load LC register with unsigned value. If LC is not 
equal to zero, start hardware DO loop with 16-bit 
loop count in register. Otherwise, skip body of loop 
(adds three additional cycles). The last address is 
16-bit absolute. 


Any register allowed except: SP, M01, SR, OMR, 
and HWS. 


Timing: 6 oscillator clock cycles 


Memory: 2 program words 
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ENDDO End Current DO Loop ENDDO 


Operation: 
NL > LF 


Assembler Syntax: 
ENDDO 


HWS[1] > HWS[0]; 0 —> NL 


Description: 


Example: 


Terminate the current hardware DO loop immediately. Normally, a hardware DO loop is terminated 
when the last instruction of the loop is executed and the current LC equals one, but this instruction can 
terminate a loop before normal completion. If the value of the current DO LC is needed, it must be read 
before the execution of the ENDDO instruction. Initially, the LF is restored from the NL bit, and the 
top-of-loop address is purged from the HWS. The contents of the second HWS location are written into 
the first HWS location, and the NL bit is cleared. 


DO YO, ENDLP ; execute loop ending at ENDLP (YO) times 
MOVEC LC,A 7 get current value of loop counter (LC) 
CMP Y1,A ; compare loop counter with value in Yl 
JNE CONTINU 7 go to ONWARD if LC not equal to Y1 
ENDDO ; LC equal to Yl, restore all DO registers 
JMP ENDLP 7 go to NEXT 
CONTINU : ; LC not equal to Y1, continue DO 
; Loop 

: ; (last instruction in DO loop) 

ENDLP MOVE #51234, X0 ; (first instruction AFTER DO loop) 


Explanation of Example: 


Note: 


Restrictions: 


This example illustrates the use of the ENDDO instruction to terminate the current DO loop. The value 
of the LC is compared with the value in the Y1 register to determine if execution of the DO loop should 
continue. The ENDDO instruction updates certain program controller registers but does not automat- 
ically jump past the end of the DO loop. Thus, if this action is desired, a JMP/BRA instruction (that is, 
JMP NEXT as shown previously) must be included after the ENDDO instruction to transfer program 
control to the first instruction past the end of the DO loop. 


The ENDDO instruction updates the program controller registers appropriately but does not automat- 
ically jump past the end of the loop. If desired, this must be done explicitly by the programmer. 


Due to pipelining and the fact that the ENDDO instruction accesses the program controller registers, 
the ENDDO instruction must not be immediately preceded by any of the following instructions: 


MOVEC to SR or HWS 
MOVEC from HWS 
Any bit-field instruction on the SR 


Also, the ENDDO instruction cannot be the next-to-last instruction in a DO loop (at the LA-1). 
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ENDDO End Current DO Loop ENDDO 


Condition Codes Affected: 
The condition codes are not affected by this instruction. 


Instruction Fields: 


Operation Operands Cc W Comments 


ENDDO 2 1 Remove one value from the hardware stack and 
update the NL and LF bits appropriately 
Note: Does not branch to the end of the loop 


Timing: 2 oscillator clock cycles 


Memory: 1 program word 
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EOR Logical Exclusive OR EOR 


Operation: Assembler Syntax: 
S®D-—D (no parallel move) EOR S,D (no parallel move) 
S ® D[31:16] — D[31:16] (no parallel move) EOR S,D (no parallel move) 


where © denotes the logical exclusive OR operator 


Description: Logically exclusive OR the source operand (S) with the destination operand (D) and store the result in 
the destination. This instruction is a 16-bit operation. If the destination is a 36-bit accumulator, the 
source is exclusive ORed with bits 31-16 of the accumulator. The remaining bits of the destination 
accumulator are not affected. 


Usage: This instruction is used for the logical exclusive OR of two registers. If it is desired to exclusive OR a 
16-bit immediate value with a register or memory location, then the EORC instruction is appropriate. 


Example: 
EOR Y1,B ; Exclusive OR Y1 with Bl 
Before Execution After Execution 
5 5555 6789 5 AA55 6789 
B2 Bi BO B2 B1 BO 
Y1 FFOO Y1 FFOO 


Explanation of Example: 
Prior to execution, the 16-bit Y1 register contains the value $FFOO, and the 36-bit B accumulator con- 
tains the value $5:5555:6789. The EOR Y1,B instruction logically exclusive ORs the 16-bit value in 
the Y1 register with bits 31-16 of the B accumulator (B1) and stores the 36-bit result in the B accumu- 
lator. The lower word of the accumulator (BO) and the extension byte (B2) are not affected by the op- 
eration. 


Condition Codes Affected: 


Re - ‘i 2 . HW | 10} SZy 4 E;}/U|N/Z/|Vic 


N — Set if bit 31 of A or B result is set 
Z — Set if bits 31-16 of A orB result are zero 
V — Always cleared 
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EOR 


Instruction Fields: 


Logical Exclusive OR 


EOR 


Operation Operands Cc W Comments 
EOR DD,FDD 2 1 16-bit exclusive OR (XOR) 
F1,DD 
Timing: 2 oscillator clock cycles 
Memory: 1 program word 


A-80 
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EO RC Logical Exclusive OR Immediate EO RC 


Operation: Assembler Syntax: 
#xxxx ® X:<ea> > X:<ea> EORC #1111,X:<ea> 
#xxxx ®D>D EORC #iiii,D 


where © denotes the logical exclusive OR operator 


Implementation Note: 
This instruction is an alias to the BFCHG instruction, and assembles as BFCHG with the 16-bit imme- 
diate value as the bit mask. This instruction will disassemble as a BFCHG instruction. 


Description: Logically exclusive OR a 16-bit immediate data value with the destination operand (D) and store the 
results back into the destination. C is also modified as described below. This instruction performs a 
read-modify-write operation on the destination and requires two destination accesses. 


Example: 
EORC #SOFFO,X:<<SFFEO; Exclusive OR with immediate data 
Before Execution After Execution 
X:$FFEO 5555 X:$FFEO 5AA5 
SR 0000 SR 0000 


Explanation of Example: 
Prior to execution, the 16-bit X memory location X:$FFEO contains the value $0010. Execution of the 
instruction tests the state of the bits 4, 8, and 9 in X:$FFEO; does not set C (because all of the CCR bits 
were not set); and then complements the bits. 


Condition Codes Affected: 


LF} * a * * * 1 10 |} SZ} oL E}/UIN/IZJ]vj{C 


For destination operand SR: 
— Changed if specified in the field 
For other Ssingtion operands: 
C — Set if all bits specified by the mask are set 
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EORC 


Instruction Fields: 


Logical Exclusive OR Immediate EO RC 


Operation Operands Comments 
EORC #xxxx, DDDDD Implemented using the BFCHG instruction. 
#XXXX,X:(R2+Xx) All registers in DDDDD are permitted except HWS. 
#XxxxX,X:(SP-xx) X:aa represents a 6-bit absolute address. Refer to 
Absolute Short Address (Direct Addressing): 
HXXXX,Xaa <aa> on page 4-22. 
HXXXX,X:PP X:pp represents a 6-bit absolute /0 address. Refer 
re ee see ae er (Direct Addressing): <pp> 
Timing: Refer to the preceding Instruction Fields table 
Memory: Refer to the preceding Instruction Fields table 
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ILLEGAL Illegal Instruction Interrupt ILLEGAL 


Operation: 


Assembler Syntax: 


Begin illegal instruction exception routine ILLEGAL (no parallel move) 


Description: Normal instruction execution is suspended and illegal instruction exception processing is initiated. The 


Usage: 


Example: 


interrupt priority level bits (11 and IO) are set to 11 in the status register. The purpose of the illegal in- 
terrupt is to force the DSP into an illegal instruction exception for test purposes. Executing an ILLE- 
GAL instruction is a fatal error; the exception routine should indicate this condition and cause the sys- 
tem to be restarted. 


If the ILLEGAL instruction is in a DO loop at the LA and the instruction at the LA-1 is being inter- 
rupted, then LC will be decremented twice due to the same mechanism that causes LC to be decrement- 
ed twice if JSR, REP,... are located at the LA. 


Since REP is uninterruptible, repeating an ILLEGAL instruction results in the interrupt not being taken 
until after completion of the REP. After servicing the interrupt, program control will return to the ad- 
dress of the second word following the ILLEGAL instruction. Of course, the ILLEGAL interrupt ser- 
vice routine should abort further processing, and the processor should be reinitialized. 


The ILLEGAL instruction provides a means for testing the interrupt service routine executed upon dis- 
covering an illegal instruction. This allows a user to verify that the interrupt service routine can cor- 
rectly recover from an illegal instruction and restart the application. The ILLEGAL instruction is not 
used in normal programming. 


ILLEGAL 


Explanation of Example: See the previous description. 


Condition Codes Affected: 


The condition codes are not affected by this instruction. 


Instruction Fields: 


Operation Operands Cc W Comments 


ILLEGAL 4 1 Execute the illegal instruction exception. This 


instruction is made available so that code may be 
written to test and verify interrupt handlers for illegal 
instructions. 


Timing: 


Memory: 


4 oscillator clock cycles 


1 program word 
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IMPY(16) inisgeiuaniniy IMPY(16) 


Operation: Assembler Syntax: 


(S1*S2) > D1 IMPY16 $1,S2,D (no parallel move) 
sign-extend D2; leave DO unchanged 


Description: Perform an integer multiplication on the two 16-bit signed integer source operands (S1 and S2) and 
store the lowest 16 bits of the integer product in the upper word (D1) of the destination accumulator 
(D), leaving the lower word (DO) unchanged and sign extending the extension register (D2). 


Usage: This instruction is useful in general computing when it is necessary to multiply two integers and the 
nature of the computation can guarantee that the result fits in a 16-bit destination. In this case, it is bet- 
ter to place the result in the MSP (A1 or B1) of an accumulator, because more instructions have access 
to this portion than to the other portions of the accumulator. 


Note: No overflow control or rounding is performed during integer multiply instructions. The result is always 
a 16-bit signed integer result that is sign extended to 24 bits. 


Example: 
IMPY Y0,X0,A ; form product 
Before Execution After Execution 
F AAAA 789A 0 000C 789A 
A2 Al AO A2 Al AO 
X0 0003 X0 0003 
YO 0004 YO 0004 


Explanation of Example: 
Prior to execution, the data ALU registers XO and YO contain, respectively, two 16-bit signed integer 
values ($0003 and $0004). The contents of the destination accumulator are not important prior to ex- 
ecution. Execution of the IMPY X0, YO, A instruction integer multiplies XO and YO and stores the re- 
sult ($000C) in Al. AO remains unchanged, and A2 is sign extended. 
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IMPY(16) inisgeriuaninty IMPY(16) 


Condition Codes Affected: 


LE |)* " 7 i * ) 11] 10 | SZ] Lb E};U|N/Z/ Vic 


E — Not defined 
U — Not defined 
N — Set if bit 35 of the result is set except during saturation 
Z — Set ifthe 20 MSBs of the result equal zero 
V — Set if overflow occurs in the 16-bit result 
Instruction Fields: 
Operation Operands Cc W Comments 
IMPY(16) Y1,X0,FDD 2 1 Integer 16x16 multiply with 16-bit result. 
Y0,X0,FDD 
Y1,Y0,FDD When the destination register is F, the FO portion is 
Y0,Y0,FDD unchanged by the instruction. 
A1,Y0,FDD 
B1,Y1,FDD Note: Assembler also accepts first two operands 
when they are specified in opposite order. 
Timing: 2 oscillator clock cycles 
Memory: 1 program word 
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INC(W) Increment Word INC(W) 


Operation: Assembler Syntax: 
D2:D1+1 > D2:D1 (parallel move) INCW D (parallel move) 


Description: Increment a 16-bit destination (D) or the two upper portions (A2:A1 or B2:B1) of a 36-bit accumulator. 
If the destination is a 36-bit accumulator, leave the LSP (AO or BO) unchanged. 


Usage: This instruction is typically used when processing integer data. 
Example: 
INCW A X:(RO),X0; Increment the 20 MSBs of A; update X0 
A Before Execution A After Execution 
0 0001 0033 0 0002 0033 
A2 Al AO A2 Al AO 


Explanation of Example: 
Prior to execution, the 36-bit A accumulator contains the value $0:0001:0033. Execution of the 
INCW A instruction increments by one the upper 20 bits of the A accumulator. 


Condition Codes Affected: 


145 14 13 12 11 10 9 8/7 6 5 4 3 2 1 0 
LF}; * |] * 7) * | * | * |} 10 /SZ}/L)E;/UINI]Z)/Vic 


— Set according to the standard definition of the SZ bit (parallel move) 
— Set if limiting (parallel move) or overflow has occurred in result 

— Set if the signed integer portion of the result is in use 

Set if result is unnormalized 

— Set if bit 35 of the result is set except during saturation 

— Set if the 20 MSBs of the result are all zeros 

— Set if overflow has occurred in result 

— Set if a carry (or borrow) occurs from bit 35 of the result 


O<NZCmrg 
| 


See Section 3.6.5, “16-Bit Destinations,” on page 3-35 for cases with X0, YO, or Y1 as D. 
See Section 3.6.2, “36-Bit Destinations—CC Bit Set,” on page 3-34 and Section 3.6.4, “20-Bit Desti- 
nations—CC Bit Set,” on page 3-34 for the case when the CC bit is set. 
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INC(W) Increment Word INC(W) 


Instruction Fields: 


Operation Operands Cc W Comments 
INC(W) FDD 2 1 Increment word 
X:(SP-xx) 8 1 Increment word in memory using appropriate 


addressing mode. 
X:aa 6 1 


X:aa represents a 6-bit absolute address. Refer to 
XIXXXX 8 2 | Absolute Short Address (Direct Addressing): 
<aa> on page 4-22. 


Data ALU Operation Parallel Memory Read or Write 
Operation Registers Memory Access Source or Destination 

INC(W) A X:(Rn)+ XO 

B X:(Rn)+N Y1 

YO 

Al 

B1 

A 

B 
Timing: Refer to the preceding Instruction Fields table 
Memory: Refer to the preceding Instruction Fields table 
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Jcc Jump Conditionally Jcc 


Operation: Assembler Syntax: 
If cc, then label —> PC Jcc XXXX 
else PC+1 —> PC 


Description: If the specified condition is true, program execution continues at the effective address specified in the 
instruction. If the specified condition is false, the PC is incremented and program execution continues 
sequentially. The effective address is a 16-bit absolute address. The Bcc instruction, which is more 
compact, operates almost identically, and can be used for very short jumps. 


The term “cc” specifies the following: 


“cc” Mnemonic Condition 
CC (HS*)— carry clear (higher or same) C=0 
CS (LO*)— carry set (lower) C=1 
EQ — equal Z=1 
GE — greater than or equal N @ V=0 
GT — greater than Z+(N ® V)=0 
LE —less than or equal Z+(N @ V)=1 
LT —less than N ® V=1 
NE — not equal Z=0 
NN — not normalized Z+(U ¢ E)=0 
NR — normalized Z+(U * E)=1 
* Only available when CC bit set in the OMR 
X denotes the logical complement of X 
+ denotes the logical OR operator 
* denotes the logical AND operator 
Y denotes the logical exclusive OR operator 
Example: 
JCS LABEL ; jump to label if carry bit is set 
INCW A 
INCW A 
LABEL 
ADD B,A 


Explanation of Example: 
In this example, if C is one when executing the JCS instruction, program execution skips the two 
INCW instructions and continues with the ADD instruction. If the specified condition is not true, no 
jump is taken, the program counter is incremented by one, and program execution continues with the 
first INCW instruction. The Jcc instruction uses a 16-bit absolute address for this example. 


Restrictions: 


A Jcc instruction used within a DO loop cannot begin at the LA or LA-1 within that DO loop. 
A Jcc instruction cannot be repeated using the REP instruction. 
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Jcc 


Condition Codes Affected: 


Jump Conditionally 


The condition codes are tested but not modified by this instruction. 


Instruction Fields: 


Jcc 


Operation Operands Cc Comments 
Jcc XXXX 6/4 16-bit absolute address 
Timing: 4 + jx oscillator clock cycles 
Memory: 2 program words 
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JMP Jump JMP 


Operation: Assembler Syntax: 
label + PC JMP XXXX 


Description: Jump to program memory at the location given by the instruction’s effective address. The effective ad- 
dress is a 16-bit absolute address. 


Example: 
JMP LABEL 


Explanation of Example: 
In this example, program execution is transferred to the address represented by label. The DSP core 
supports up to 16-bit program addresses. 


Condition Codes Affected: 
The condition codes are not affected by this instruction. 


Restrictions: 
A JMP instruction used within a DO loop cannot begin at the LA within that DO loop. 
A JMP instruction cannot be repeated using the REP instruction. 


Instruction Fields: 


Operation Operands Cc W Comments 
JMP XXXX 6 2 16-bit absolute address 
Timing: 6 + jx oscillator clock cycle 
Memory: 2 program words 
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JSR Jump to Subroutine JSR 


Operation: Assembler Syntax: 
SP+1 — SP JSR XXXX 
PC — X:(SP) 

SP+1 — SP 

SR — X:(SP) 

XXXX — PC 


Description: Jump to subroutine in program memory at the location given by the instruction’s effective address. The 
effective address is a 16-bit absolute address. 


Example: 


JSR LABEL ; jump to absolute address indicated by LABEL 


Explanation of Example: 
In this example, program execution is transferred to the subroutine at the address represented by LA- 
BEL. The DSP core supports up to 16-bit program addresses. 


Condition Codes Affected: 
The condition codes are not affected by this instruction. 


Restrictions: 
A JSR instruction used within a DO loop cannot begin at the LA within that DO loop. 
A JSR instruction used within a DO loop cannot specify the LA as its target. 
A JSR instruction cannot be repeated using the REP instruction. 


Instruction Fields: 


Operation Operands Cc W Comments 


JSR XXXX 8 2 Push return address and status register and jump to 
16-bit target address 


Timing: 8 + jx oscillator clock cycles 


Memory: 2 program word 
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L EA Load Effective Address L EA 
Operation: Assembler Syntax: 
ea > D (no parallel move) LEA ea 


Description: The address calculation specified is executed and the resulting effective address (ea) is stored in the 
destination register (D). The source address register and the update mode used to compute the updated 
address are specified by the effective address. The source address register specified in the effective ad- 
dress is not updated. All update addressing modes may be used. The new register contents are available 
for use by the immediately following instruction. 


Example: 


EA (RO) +N ; update RO using (RO)+N 
Before Execution After Execution 
RO 8001 RO 8C02 
N 0C01 N 0C01 
MO1 1000 MO1 1000 


Explanation of Example: 

Prior to execution, the 16-bit address register RO contains the value $8001, the 16-bit address register 
N contains the value $0C01, and the 16-bit modulo register M01 contains the value $1000. Execution 
of the LEA (RO) +N instruction adds the contents of the RO register to the contents of the N register 
and stores the resulting updated address in the RO address register. The addition is performed using 
modulo arithmetic since it is done with the RO register and MO1 is not equal to $FFFF. No wraparound 
occurs during the addition because the result falls within the boundaries of the modulo buffer. 


Condition Codes Affected: 
The condition codes are not affected by this instruction. 


Instruction Fields: 


Operation Operands Cc W Comments 
LEA (Rn)+ 2 1 Increment the Rn pointer register 
(Rn)- 2 1 Decrement the Rn pointer register 
(Rn)+N 2 1 Add first operand to the second and store the result 
in the second operand 
(R2+xx) 2 1 Add a 6-bit unsigned immediate value to R2 and 
store in the R2 Pointer 
(SP-xx) 2 1 Subtract a 6-bit unsigned immediate value from SP 
and store in the SP register 
(Rn+xxxx) 4 2 Add a 16-bit signed immediate value to the specified 
source register. 
Timing: 2+ea oscillator clock cycles 
Memory: 1+ea program words 
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LSL 


Operation: 


(see following figure) 


Logical Shift Left 


Assembler Syntax: 


D 


C<+— |Unch. 


LSL 
Y 
<— Unchanged 
D2 D1 DO 


—— 0 


(no parallel move) 


LSL 


Description: Logically shift 16 bits of the destination operand (D) 1 bit to the left and store the result in the desti- 
nation. If the destination is a 36-bit accumulator, the result is stored in the MSP of the accumulator (A1 
or B1), and the remaining portions of the accumulator (A2, B2, AO, BO) are not modified. The MSB 
of the destination (bit 31 if the destination is a 36-bit accumulator) prior to the execution of the instruc- 
tion is shifted into C, and zero is shifted into the LSB of D1 (bit 16 if the destination is a 36-bit accu- 


mulator). 


Example: 
LSL 


Before Execution 


; multiply 


After Execution 


Bl by 2 


6 8000 OOAA 
B2 Bi BO 
SR 0300 


Explanation of Example: 


6 0000 OOAA 
B2 Bi BO 
SR 0305 


Prior to execution, the 36-bit B accumulator contains the value $6:8000:00AA. Execution of the 
LSL_ B instruction shifts the 16-bit value in the B1 register 1 bit to the left and stores the result back 
in the B1 register. C is set by the operation because bit 31 of Al was set prior to the execution of the 
instruction. The Z bit of CCR (bit 2) is also set because the result in Al is zero. 
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LSL Logical Shift Left LSL 
Condition Codes Affected: 
MR pie CCR > 
15 14 13 12 #11 #10 9 8 7 6 5 4 3 2 1 0 
LF | * J . : i i lo |} sz] L E U|N/IZ/VIC 
L — Set if overflow has occurred in result 
N — Set if bit 31 of A or B result is set 
Z — Setif Al or B1 result equals zero 
V — Always cleared 
C — Set if bit 31 of A or B was set prior to the execution of the instruction 
Instruction Fields: 
Operation Operands Cc W Comments 
LSL FDD 2 1 1-bit logical shift left of word 
Timing: 2 oscillator clock cycles 
Memory: 1 program word 
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LSLL Multi-Bit Logical Left Shift LSLL 


Operation: Assembler Syntax: 
S1<<S2—> D (no parallel move) LSLL $1,S2,D (no parallel move) 


Description: Logically shift the first 16-bit source operand (S1) to the left by the value contained in the lowest 4 bits 
of the second source operand (S2) and store the result in the destination register (D). The destination 
must always be a 16-bit register. 


Implementation Note: 
This instruction is actually implemented by the assembler using the ASLL instruction. It will disas- 


semble as ASLL. 


Example: 
LSLL Y1,X0,Y1 ; left shift of 16-bit Y1 by X0 
Before Execution After Execution 
Y1 AAAA Y1 AAAO 
X0 0004 X0 0004 


Explanation of Example: 
Prior to execution, the Y1 register contains the value to be shifted ($SAAAA) and the X0 register con- 
tains the amount to shift by ($0004). The contents of the destination register are not important prior to 
execution because they have no effect on the calculated value. The LSLL instruction logically shifts 
the value $AAAA four bits to the left and places the result in the destination register Y1. 


Condition Codes Affected: 


15 14 13 12 #11 #10 9 8 7 6 5 4 3 2 1 0 


EBT. * " n * 1 | oO | S$ L E;}/U|JN/Zi]vjic 


N — Set if bit 15 of result is set except during saturation 
Z — Set if the result in D is zero 
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LSLL Multi-Bit Logical Left Shift LSLL 


Instruction Fields: 


Operation Operands Cc W Comments 


LSLL Y1,X0,FDD 2 1 Logical shift left of the first operand by value speci- 
Y0,X0,FDD fied in four LSBs of the second operand; places 
Y1,Y0,FDD result in FDD 

Y0,Y0,FDD 
A1,Y0,FDD Implemented using ASLL instruction 
B1,Y1,FDD 


Timing: 2 oscillator clock cycles 


Memory: 1 program word 
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LSR 


Operation: 


(see following figure) 


Logical Shift Right 


Assembler Syntax: 


LSR 
4 
Unch. —> Unchanged 
D2 D1 DO 


__p»C 


D 


(no parallel move) 


LSR 


Description: Logically shift 16 bits of the destination operand (D) 1 bit to the right and store the result in the desti- 
nation. If the destination is a 36-bit accumulator, the result is stored in the MSP of the accumulator (A1 
or B1), and the remaining portions of the accumulator (A2, B2, AO, BO) are not modified.The LSB of 
the destination (bit 16 if the destination is a 36-bit accumulator) prior to the execution of the instruction 
is shifted into C, and zero is shifted into the MSB of D1 (bit 3 lif the destination is a 36-bit accumula- 


tor). 


Example: 
LSR 


Before Execution 


; divide 


Bl by 2 ( 


After Execution 


F 0001 OOAA F 0000 OOAA 
B2 Bi BO B2 Bi BO 
SR 0300 SR 0305 


Explanation of Example: 
Prior to execution, the 36-bit B accumulator contains the value $F:0001:00AA. Execution of the 


LSR 


Bl considered unsigned) 


B instruction shifts the 16-bit value in the B1 register | bit to the right and stores the result back 


in the B1 register. C is set by the operation because bit 0 of B1 was set prior to the execution of the 
instruction. The Z bit of CCR (bit 2) is also set because the result in B1 is zero. 
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Instruction Set Details 
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LSR Logical Shift Right LSR 
Condition Codes Affected: 
MR pie CCR > 
15 14 13 12 #11 #10 9 8 7 6 5 4 3 2 1 0 
LF | * J . : i i lo |} sz] L E U|N;|Z/VIiC 
L — Set if data limiting has occurred during parallel move 
N — Always cleared 
Z — Setif Al or B1 result equals zero 
V — Always cleared 
C — Set if bit 16 of A or B was set prior to the execution of the instruction 
Instruction Fields: 
Operation Operands Cc W Comments 
LSR FDD 2 1 1-bit logical shift right of word 
Timing: 2 oscillator clock cycles 
Memory: 1 program word 
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LSRAC se togical Right Shift with Accumulae LSRAC 


Operation: Assembler Syntax: 
S1>>S2+D-D (no parallel move) LSRAC $1,S2,D (no parallel move) 


Description: Logically shift the first 16-bit source operand (S1) to the right by the value contained in the lowest 4 
bits of the second source operand (S2), and accumulate the result with the value in the destination reg- 


ister (D). 
Usage: This instruction is used for multi-precision logical right shifts. 
Example: 
LSRAC Y1,X0,A ; 16-bit add 
Before Execution After Execution 
0 0000 0099 0 0CO00 3099 
A2 Al AO A2 Al AO 
Y1 C003 Y1 C003 
X0 0004 X0 0004 


Explanation of Example: 
Prior to execution, the Y1 register contains the value to be shifted ($C003), the XO register contains 
the amount by which to shift ($0004), and the destination accumulator contains $0:000:0099. The 
LSRAC instruction logically shifts the value $C003 four bits to the right and accumulates this result 
with the value already in the destination register A. Since the destination is an accumulator, the exten- 
sion word (A2) is filled with sign extension. 


Condition Codes Affected: 


Si a a = * id - HW | 10 7} SZ]7 Lb E;}/U|N/Zi]vjic 


N — Set if bit 35 of A or B result is set except during saturation 
Z — Setif A or B result equals zero 


See Section 3.6.2, “36-Bit Destinations—CC Bit Set,” on page 3-34 and Section 3.6.4, “20-Bit Desti- 
nations—CC Bit Set,” on page 3-34 for the case when the CC bit is set. 
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LSRAC 


Instruction Fields: 


Logical Right Shift with Accumulae LSRAC 


Operation Operands Cc W Comments 
LSRAC Y1,X0,F 2 1 Logical word shifting with accumulation 

Y0,X0,F 
Y1,Y0,F 
Y0,Y0,F 
A1,Y0,F 
B1,Y1,F 

Timing: 2 oscillator clock cycles 

Memory: 1 program word 
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LSRR Multi-Bit Logical Right Shift LSRR 


Operation: Assembler Syntax: 
S1>>S2— D (no parallel move) LSRR $1,S2,D (no parallel move) 


Description: Logically shift the first 16-bit source operand (S1) to the right by the value contained in the lowest 4 
bits of the second source operand (S2), and store the result in the destination register (D). If the desti- 
nation is a 36-bit accumulator, correctly zero extend into the extension register (A2 or B2) and place 
zero in the LSP (AO or BO). 


Example: 
LSRR Y1,X0,A ; right shift of 16-bit Y1 by xX0 
Before Execution After Execution 
0 3456 3456 0 OAAA 0000 
A2 Al AO A2 Al AO 
Y1 AAAA Y1 AAAA 
X0 0004 X0 0004 


Explanation of Example: 
Prior to execution, the Y1 register contains the value to be shifted (SAAAA), and the X0 register con- 
tains the amount by which to shift ($0004). The contents of the destination register are not important 
prior to execution because they have no effect on the calculated value. The LSRR instruction logically 
shifts the value $AAAA four bits to the right and places the result in the destination register (A). Since 
the destination is an accumulator, the extension word (A2) is filled with sign extension, and the LSP 
(AO) is set to zero. 


Condition Codes Affected: 


LF |) * * " - HW | 10} SZy Lb E;}/U|N/Zi]vjic 


N — Set if bit 35 of A or B result is set except during saturation 
Z — Setif A or B result equals zero 
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LSRR 


Instruction Fields: 


Multi-Bit Logical Right Shift 


LSRR 


Operation Operands Cc W Comments 
LSRR Y1,X0,FDD 2 1 Logical shift right of the first operand by value speci- 

Y0,X0,FDD fied in four LSBs of the second operand; places 
Y1,Y0,FDD result in FDD (when result is to an accumulator F, 
Y0,Y0,FDD zero extends into F2) 
A1,Y0,FDD 
B1,Y1,FDD 

Timing: 2 oscillator clock cycles 

Memory: 1 program word 
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MAC Multiply-Accumulate MAC 


Operation: Assembler Syntax: 

D+S1*S2—D (no parallel move) MAC (+)S1,S2,D (no parallel move) 
D+S1 * S2—D (one parallel move) MAC $1,S2,D (one parallel move) 
D+S1 * S2 4 D (two parallel reads) MAC $1,S2,D (two parallel reads) 


Description: Multiply the two signed 16-bit source operands (S1 and S2) and add or subtract the product to or from 
the specified 36-bit destination accumulator (D). The “-” sign option is used to negate the specified 
product prior to accumulation. This option is not available when a single parallel move is performed 
or when two parallel read operations are performed. 


Usage: This instruction is used for multiplication and accumulation of fractional data or integer data when a 
full 32-bit product is required (see Section 3.3.5.2, “Integer Multiplication,” on page 3-20). When the 
destination is a 16-bit register, this instruction is useful only for fractional data. 


Example: 
MAC X0,Y1,A X:(R1)+,Y1  X: (R3)+,xX0 
Before Execution After Execution 
0 0003 0003 0 0553 0003 
A2 Al AO A2 Al AO 
X0 4000 X0 4000 
Y1 OAAO Y1 OAAO 


Explanation of Example: 
Prior to execution, the 16-bit XO register contains the value $4000, the 16-bit Y1 register contains the 
value $0AA0, and the 36-bit A accumulator contains the value $0:0003:0003. Execution of the 
MAC XO, Y1,A instruction multiplies the 16-bit signed value in the XO register by the 16-bit signed 
value in Y1, adds the resulting 32-bit product to the 36-bit A accumulator, and stores the result 
($0:0553:0003) into the A accumulator. In parallel, XO and Y1 are updated with new values fetched 
from data memory, and the two address registers (R1 and R3) are post-incremented by one. 


Condition Codes Affected: 


145 14 13 12 11 10 9 8/7 6 5 4 3 2 1 0 
LF); * |] *)* | * | * |] 0 /SZ/L]/E;UJN{ ZI]Vic 


— Set according to the standard definition of the SZ bit (parallel move) 
— Set if limiting (parallel move) or overflow has occurred in result 

— Set if the signed integer portion of A or B result is in use 

Set according to the standard definition of the U bit 

— Set if bit 35 of A or B result is set except during saturation 

— Setif A or B result equals zero 

— Set if overflow has occurred in A or B result 


<NZCmre 
| 


See Section 3.6.5, “16-Bit Destinations,” on page 3-35 for cases with X0, YO, or Y1 as D. 
See Section 3.6.2, “36-Bit Destinations—CC Bit Set,” on page 3-34 and Section 3.6.4, “20-Bit Desti- 
nations—CC Bit Set,” on page 3-34 for the case when the CC bit is set. 
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MAC Multiply-Accumulate MAC 
Instruction Fields: 
Operation Operands Cc W Comments 
MAC (+)¥1,X0,FDD 2 1 Fractional multiply accumulate; multiplication result 
(+)Y0,X0,FDD optionally negated before accumulation 
(+)¥1,Y0,FDD 
(+)Y0,Y0,FDD 
(+)A1,Y0,FDD 
(+)B1,Y1,FDD 
Data ALU Operation Parallel Memory Read or Write 
Operation Registers Memory Access Source or Destination 
MAC Y1,B1,F X:(Rn)+ X0 
Y0,Y0,F X:(Rn)+N Y1 
Y0,A1,F YO 
X0,Y0,F A 
X0,Y1,F B 
YO,Y1,F Al 
B1 
(F = Aor B) 
Data ALU First and Second Memory Destinations for Memory 
Operation Reads Reads 
Operation Registers Read1 Read2 Destination1 Destination2 
MAC YO0,X0,F X:(RO)+ X:(R3)+ YO X0 
Y1,X0,F X:(RO)+N X:(R3)- 
Y1,Y0,F v1 x0 
X:(R1)+ ; : 
: Valid Valid 
ea Or) EN destinations destinations 
for Read for Read2 
Timing: 2 + mv oscillator clock cycles for MAC instructions with a parallel move 


Refer to previous table for MAC instructions without a parallel move 


Memory: 


1 program word for MAC instructions with a parallel move 


Refer to previous table for MAC instructions without a parallel move 
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MAC R Multiply-Accumulate and Round MAC R 


Operation: Assembler Syntax: 

D+S1*S2+r-—D (no parallel move) MACR (+)S1,S2,D(no parallel move) 
D+S1*S2+r-—D (one parallel move) MACR $1,S2,D (one parallel move) 
D+S1 *S2+r-— D (two parallel reads) MACR $1,S2,D (two parallel reads) 
Description: Multiply the two signed 16-bit source operands (S1 and S2), add or subtract the product to or from the 


specified 36-bit destination accumulator (D), and round the result using the specified rounding. The 
rounded result is stored in the destination accumulator. (Refer to RND for more complete information 
on the convergent rounding process.) The “-” sign option is used to negate the specified product prior 
to accumulation. This option is not available when a single parallel move or two parallel reads are per- 
formed. The default sign option is “+”. 


Usage: This instruction is used for the multiplication, accumulation, and rounding of fractional data. 
Example: 
MACR -X0,Y1,A 
Before Execution After Execution 
0 0003 8000 0 2004 0000 
A2 Al AO A2 Al AO 
X0 4000 XO 4000 
Y1 C000 Y1 C000 


Explanation of Example: 


Prior to execution, the 16-bit XO register contains the value $4000, the 16-bit Y1 register contains the 
value $CO00, and the 36-bit A accumulator contains the value $0:0003:8000. Execution of the 
MACR -—X0,Y1,A instruction multiplies the 16-bit signed value in the XO register by the 16-bit 
signed value in Y1 and subtracts the resulting 32-bit product from the 36-bit A accumulator, rounds 
the result, and stores the result ($0:2004:0000) into the A accumulator. In this example, the default 
rounding (convergent rounding) is performed. 
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MACR 


Multiply-Accumulate and Round MACR 


Condition Codes Affected: 


10 9 8 7 6 5 4 3 2 1 0 


LF 


*!uH|i0/SZ;)/L)/E;};U}N/]Z/Vic 


<NZCUCY 
| 


— Set according to the standard definition of the SZ bit (parallel move) 
— Set if limiting (parallel move) or overflow has occurred in result 
— Set if the signed integer portion of A or B result is in use 


Set according to the standard definition of the U bit 


— Set if bit 35 of A or B result is set except during saturation 
— Setif A or B result equals zero 
— Set if overflow has occurred in A or B result 


See Section 3.6.5, “16-Bit Destinations,” on page 3-35 for cases with X0, YO, or Y1 as D. 
See Section 3.6.2, “36-Bit Destinations—CC Bit Set,” on page 3-34 and Section 3.6.4, “20-Bit Desti- 
nations—CC Bit Set,” on page 3-34 for the case when the CC bit is set. 


Instruction Fields: 


Operation Operands Cc W Comments 
MACR (+)Y¥1,X0,FDD 2 1 Fractional MAC with round, multiplication result 
(+)Y0,X0,FDD optionally negated before addition. 
(+)¥1,Y0,FDD 
(+)Y0,Y0,FDD 
(+)A1,Y0,FDD 
(+)B1,Y1,FDD 
Data ALU Operation Parallel Memory Read or Write 
Operation Registers Memory Access Source or Destination 

MACR Y1,B1,F X:(Rn)+ X0 

Y0,Y0,F X:(Rn)+N Y1 

Y0,A1,F YO 

X0,Y0,F A 

X0,Y1,F B 

YO, Y1,F Al 

B1 

(F = AorB) 
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MACR 


Multiply-Accumulate and Round 


MACR 


Data ALU First and Second Memory Destinations for Memory 
Operation Reads Reads 
Operation Registers Read1 Read2 Destination1 Destination2 
MACR YO,X0,F X:(RO)+ X:(R3)+ YO XO 
Y1,X0,F X:(RO)+N X:(R3)- 
Y1,Y0,F ‘tl X0 
X:(R1)+ : : 
; Valid Valid 
Ore) AES destinations destinations 
for Read for Read2 
Timing: 2 + mv oscillator clock cycles for MACR instructions with a parallel move 
Refer to previous table for MACR instructions without a parallel move 
Memory: 1 program word for MACR instructions with a parallel move 
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Refer to previous table for MACR instructions without a parallel move 
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MACSU Multiply-Accumulate Signed x Unsigned MACSU 


Assembler Syntax: 


Operation: 
D+S1*S2—D (SI signed, S2 unsigned) 


MACSU 


$1,S2,D 


(no parallel move) 


Description: Multiply the two 16-bit source operands (S1 and S2) and add the product to the specified 36-bit desti- 
nation accumulator (D). S1 can be unsigned, but S2 is always considered unsigned. This mixed arith- 
metic multiply-accumulate does not allow a parallel move and can be used for multi-precision multi- 


Usage: 


Example: 


plications. 


In addition to single-precision multiplication of a signed-times-unsigned value and accumulation, this 
instruction is also used for multi-precision multiplications, as shown in Section 3.3.8.2, “Multi-Preci- 


sion Multiplication,” on page 3-23. 


MACSU X0,Y0,A 


Before Execution 


0 0000 0099 
A2 Al AO 

X0 3456 

YO 8000 


Explanation of Example: 
The 16-bit XO register contains the value $3456 and the 16-bit YO register contains the value $8000. 
Execution of the MACSU XO, Y0O,A instruction multiplies the 16-bit signed value in the XO register 
by the 16-bit unsigned value in YO, and then adds the result to the A accumulator and stores the signed 
result back into the A accumulator. If this were a MAC instruction, YO ($8000) would equal -1.0, and 
the multiplication result would be $F:CBAA:0000. Since this is a MACSU instruction, YO is consid- 
ered unsigned and equals +1.0. This gives a multiplication result of $0:3456:0000. 
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After Execution 


0 3456 0099 
A2 Al AO 

X0 3456 

YO 8000 
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MACSU Multiply-Accumulate Signed x Unsigned MACSU 


Condition Codes Affected: 


LF} * . 7 id ‘ if lo }sz|L|/E;);U;JN{|Z]Vic 


— Set if the signed integer portion of A or B result is in use 
— Set according to the standard definition of the U bit 

Set if bit 35 of A or B result is set except during saturation 
— Setif A or B result equals zero 

— Set if overflow has occurred in A or B result 


<NZOm 
| 


See Section 3.6.5, “16-Bit Destinations,” on page 3-35 for cases with X0, YO, or Y1 as D. 
See Section 3.6.2, “36-Bit Destinations—CC Bit Set,” on page 3-34 and Section 3.6.4, “20-Bit Desti- 
nations—CC Bit Set,” on page 3-34 for the case when the CC bit is set. 


Instruction Fields: 


Operation Operands Cc W Comments 


MACSU X0,Y1,FDD 2 1 Signed or unsigned 16x16 fractional MAC with 
X0,Y0,FDD 32-bit result. 
Y0,Y1,FDD 
Y0,Y0,FDD The first operand is treated as signed and the sec- 
Y0,A1,FDD ond as unsigned. 
Y1,B1,FDD 


Timing: 2 oscillator clock cycles 


Memory: 1 program word 
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MOVE Introduction to DSP56800 Moves MOVE 


Description: The DSP56800 Family instruction set contains a powerful set of moves, resulting not only in better 
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DSP performance, but in simpler, more efficient general-purpose computing. The powerful set of con- 
troller and DSP moves results not only in ease of programming, but in more efficient code that, in turn, 
results in reduced power consumption for an application. This description gives an introduction to all 
of the different types of moves available on the DSP56800 architecture. It covers all of the variations 
of the MOVE instruction, as well as all of the parallel moves. There are eight types of moves available 
on the DSP56800: 

e Any register <> any register 

e Any register <> X data memory 

e Any register <> on-chip peripheral register 

¢ Immediate data > any register 

¢ Immediate data — X data memory 


Immediate data > on-chip peripheral register 


Register <> program memory 

¢ One X data memory access in parallel with an arithmetic operand (single parallel move) 
¢ Two X data memory reads in parallel with an arithmetic operand (dual parallel read) 

¢ Two X data memory reads in parallel with no arithmetic operand specified (MOVE only) 
¢ Conditional register transfer (transfer only if condition is true) 

¢ Register transfer through the data ALU 


The preceding move types are discussed in detail under the following DSP56800 instructions: 
MOVE: 
¢ One X data memory access in parallel with an arithmetic operand (single parallel move) 
¢ Two X data memory reads in parallel with an arithmetic operand (dual parallel read) 
¢ Two X data memory reads in parallel with no arithmetic operand specified (MOVE only) 
MOVE(C): 
e Any register <> any register 
¢ Any register <> X data memory 
e Any register <> on-chip peripheral register 
MOVE(I): 
¢ Immediate data — any register 
¢ Immediate data — X data memory 
MOVE(M): 
¢ Two X data memory reads in parallel with no arithmetic operand specified 
MOVE(P): 
¢ Register <> on-chip peripheral register 
¢ Immediate data > on-chip peripheral register 
MOVE(S): 
¢ Register <> first 64 locations of X data memory 
¢ Immediate data — first 64 locations of X data memory 
Tce: 
¢ Conditional register transfer (transfer only if condition is true) 


TFR: 
¢ Register transfer through the data ALU 
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MOVE Introduction to DSP56800 Moves MOVE 


Description: Two types of parallel moves are permitted—register-to-memory moves and dual memory-to-register 
moves. Both types of parallel moves use a restricted subset of all available DSP56800 addressing 
modes, and the registers available for the move portion of the instruction are also a subset of the total 
set of DSP core registers. These subsets include the registers and addressing modes most frequently 
found in high performance numeric computation and DSP algorithms. Also, the parallel moves allow 
a move to occur only with an arithmetic operation in the data ALU. A parallel move is not permitted, 
for example, with a JMP, LEA, or BFSET instruction. 


Since the on-chip peripheral registers are accessed as locations in X data memory, there are many move 
instructions that can access these peripheral registers. Also, the case of “No Move Specified” for arith- 
metic operations optionally allows a parallel move. 


When a 36-bit accumulator (A or B) is specified as a source operand (S), there is a possibility that the 
data may be limited. If the data out of the accumulator indicates that the accumulator extension bits are 
in use, and the data is to be moved into a 16-bit destination, the value stored in the destination is limited 
to a maximum positive or negative saturation constant to minimize truncation error. Limiting does not 
occur if an individual 16-bit accumulator register (A1, AO, B1, or BO) is specified as a source operand 
instead of the full 36-bit accumulator (A or B). This limiting feature allows block floating-point oper- 
ations to be performed with error detection since the L bit in the CCR is latched (that is, sticky). 


When a 36-bit accumulator (A or B) is specified as a destination operand (D), any 16-bit source data 
to be moved into that accumulator is automatically extended to 36 bits by sign extending the MSB of 
the source operand (bit 15) and appending the source operand with 16 LS zeros. The automatic sign 
extension and zeroing features may be circumvented by specifying the destination register to be one 
of the individual 16-bit accumulator registers (Al or B1). 


The MOVE, MOVE(C), MOVE(I), MOVE(M), MOVE(P), and MOVE(S) descriptions are found on 


the following pages. Detailed descriptions of the two parallel move types are covered under the MOVE 
instruction. The Tcc and TFR descriptions are covered in their respective sections. 
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MOVE Parallel Move—Single Parallel Move MOVE 


Operation: Assembler Syntax: 
<op> X:<ea> > D <op> X:<ea>,D 
<op> S — X:<ea> <op> S,X:<ea> 


<op> refers to any arithmetic instruction that allows parallel moves. Examples include ADD, DECW, MACR, NEG, 
SUB, TFR, and so on. 


Description: Perform a data ALU operation and, in parallel, move the specified register from or to X data memory. 
Two indirect addressing modes may be used (post-increment by one and post-increment by the offset 
register). 


Seventeen data ALU instructions allow the capability of specifying an optional single parallel move. 
These data ALU instructions have been selected for optimal performance on the critical sections of fre- 
quently used DSP algorithms. A summary of the different data ALU instructions, registers used for the 
memory move, and addressing modes available for the single parallel move is shown in Table 6-34, 
“Data ALU Instructions—Single Parallel Move,” on page 6-29. 


If the arithmetic operation of the instruction specifies a given source register (S) or destination register 
(D), that same register or portion of that register may be used as a source in the parallel data bus move 
operation. This allows data to be moved in the same instruction in which it is being used as a source 
operand by a data ALU operation. That is, duplicate sources are allowed within the same instruction. 
Examples of duplicate sources include the following: 


ADD A,B A,X: (R2)+ A register allowed as source of 


, 
; parallel move 

ADD A,B X:(R2)+,A ; A register allowed as destination 
; Of parallel move 


Description: If the arithmetic operation portion of the instruction specifies a given destination accumulator, that 
same accumulator or portion of that accumulator may not be specified as a destination in the parallel 
data bus move operation. Thus, if the opcode-operand portion of the instruction specifies the 36-bit A 
or B accumulator as its destination, the parallel data bus move portion of the instruction may not spec- 
ify A0/BO, A1/B1, A2/B2, or A/B as its destination. That is, duplicate destinations are not allowed 
within the same instruction. Examples of duplicate destinations include the following: 


ADD B,A X:(R2)+,A ; NOT ALLOWED--A register used twice 
; as a destination 

ASL A X:(R2)+,A ; NOT ALLOWED--A register used twice 
’ 


as a destination 


Exceptions: 
TST, CMP, and CMPM allow both the accumulator and its lower portion (A and AO, B and BO) to be 
the parallel move destination even if this accumulator is used by the data ALU operation. These in- 
structions do not have a true destination. 
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MOVE Parallel Move—Single Parallel Move MOVE 


Example: 
ASL A A,X: (R3)+N ; save old value of A in X:(R3), 
; A*2 > A, update R3 
Before Execution After Execution 

0 5555 3333 0 AAAA CCCC 

A2 At AO a AO 

X:$00FF 1234 X:$00FF 5555 

R3 OOFF R3 0103 

N 0004 N 0004 


Explanation of Example: 
Prior to execution, the 16-bit R3 address register contains the value $00FF, the A accumulator contains 
the value $0:5555:3333, and the 16-bit X memory location X:$00FF contains the value $1234. Execu- 
tion of the parallel move portion of the instruction, A, X: (R3) +, uses the R3 address register to move 
the contents of the Al register before left shifting into the 16-bit X memory location (X:$00FF). R3 is 
then updated by the value in the N register. 


Condition Codes Affected: 


15 14 13 12 #11 #10 9 8 7 6 5 4 3 2 1 0 


Lat - * i 7 if lo |/SZ} L}E;UINI Z]vic 


SZ — Set according to the standard definition of the SZ bit during parallel move 
L — Set if data limiting has occurred during parallel move 
Data ALU Operation Parallel Memory Read or Write 
Operation Registers Memory Access Source or Destination 
Operation Operands X:(Rn)+ x0 
X:(Rn)+N Y1 
YO 
Al 
B1 
A 
B 
Timing: 2 
Memory: 1 program word for all instructions of this type 
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MOVE Parallel Move—Dual Parallel Reads MOVE 


Operation: Assembler Syntax: 
<op> X:<ea>—> D1 X:<ea> — D2 <op> X:<ea>,D1X:<ea>,D2 
MOVE X:<ea>—>DI1 X:<ea> > D2 MOVE X:<ea>,D1X:<ea>,D2 


where <op> refers to a limited set of arithmetic instructions which allow double parallel reads 


Description: Read two 16-bit word operands from X memory. Two independent effective addresses (ea) can be 
specified where one of the effective addresses uses the RO or R1 address register, while the other ef- 
fective address must use address register R3. Two parallel address updates are then performed for each 
effective address. The address update on R3 is only performed using linear arithmetic, and the address 
update on RO or R1 is performed using linear or modulo arithmetic. 


Six data ALU instructions (ADD, MAC, MACR, MPY, MPYR, and SUB) allow the capability of 
specifying an optional dual memory read. In addition, MOVE can be specified. These data ALU in- 
structions have been selected for optimal performance on the critical sections of frequently used DSP 
algorithms. A summary of the different data ALU instructions, registers used for the memory move, 
and addressing modes available for the dual parallel read is shown in Table 6-35, “Data ALU Instruc- 
tions—Dual Parallel Read,” on page 6-30. When the MOVE instruction is selected, only the dual 
memory accesses occur—no arithmetic operation is performed. 


Example: 
MPYR X0,Y0,A X: (RO) +, YOX: (R3) +, X0 
Before Execution After Execution 

0 1234 5678 0 2AAA 0000 

A2 Al AO A2 Al AO 
X:(R3) CCCC X:(R8) CCCC 
X:(RO) BBBB X:(RO) BBBB 
X0 4000 X0 CCCC 
YO 5555 YO BBBB 


Explanation of Example: 
Prior to execution, the 16-bit XO register contains the value $4000, and the 16-bit YO register contains 
the value $5555. Execution of the parallel move portion of the instruction, 
X:(RO)+,YO X: (R3) +, X0, moves the 16-bit value in the X memory location X:(RO) into the reg- 
ister YO, moves the 16-bit X memory location X:(R3) into the register XO, and post-increments by one 
the 16-bit values in the RO and R3 address registers. The multiplication is performed with the old val- 
ues of XO and YO, and the result is convergently rounded before storing it in the accumulator. 


Note: The second X data memory parallel read using the R3 address register can never access off-chip mem- 
ory or on-chip peripherals. It can only access on-chip X data memory. 
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MOVE 


MOVE Parallel Move—Dual Parallel Reads 


Condition Codes Affected: 


LF) */} * | * | * 7 * | 4 | to}szj7bL}e};uy ny zivie 
L — Set if data limiting has occurred during parallel move 
Instruction Fields: 
Data ALU First and Second Memory Destinations for Memory 
Operation Reads Reads 
Operation Registers Read1 Read2 Destination1 Destination2 
Operation Operands X:(RO)+ X:(R3)+ YO X0 
X:(RO)+N X:(R3)- 
Y1 XO 
Renee Valid Valid 
ts destinations destinations 
for Read1 for Read2 
Data ALU First and Second Memory Destinations for Memory 
Operation Reads Reads 
Operation Read1 Read2 Destination1 Destination2 
MOVE X:(RO)+ X:(R3)+ YO XO 
X:(RO)+N X:(R3)- 
Y1 XO 
be a Valid Valid 
ae destinations destinations 
for Read for Read2 


Timing: 2 + mv oscillator clock cycles for all instructions of this type 


Memory: 1 program word for all instructions of this type 


A-115 
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MOVE(C) Move Control Register MOVE(C) 


Operation: 


X:<ea>— D 
Sl X:<ea> 


S7>D 


Description: 


Note: 


Note: 


A-116 


Assembler Syntax: 


MOVE(C) X:<ea>,D 
MOVE(C) S,X:<ea> 
MOVE(C) S,D 


Move the contents of the specified source (control) register (S) to the specified destination, or move 
the specified source to the specified destination (control) register (D). The control registers S and D 
consist of the AGU registers, data ALU registers, and the program controller registers. These registers 
may be moved to or from any other register or location in X data memory. 


If the HWS is specified as a destination operand, the contents of the first HWS location are copied into 
the second one, and the LF and NL bits are updated accordingly. If the HWS is specified as a source 
operand, the contents of the second HWS location are copied into the first one, and the LF and NL bits 
are updated accordingly. This allows more efficient manipulation of the HWS. 


When a 36-bit accumulator (A or B) is specified as a source operand, there is a possibility that the data 
may be limited. If the data out of the shifter indicates that the accumulator extension register is in use, 
and the data is to be moved into a 16-bit destination, the value stored in the destination is limited to a 
maximum positive or negative saturation constant to minimize truncation error. Limiting does not oc- 
cur if an individual 16-bit accumulator register (Al, AO, B1, or BO) is specified as a source operand 
instead of the full 36-bit accumulator (A or B). This limiting feature allows block floating-point oper- 
ations to be performed with error detection since the L bit in the CCR is latched (that is, sticky). 


When a 36-bit accumulator (A or B) is specified as a destination operand, any 16-bit source data to be 
moved into that accumulator is automatically extended to 36 bits by sign extending the MSB of the 
source operand (bit 15) and appending the source operand with 16 LS zeros. The automatic sign ex- 
tension and zeroing features may be circumvented by specifying the destination register to be one of 
the individual 16-bit accumulator registers (Al or B1). 


Due to pipelining, if an address register (Rn, SP, or MO1) is changed with a MOVE or bit-field instruc- 
tion, the new contents will not be available for use as a pointer until the second following instruction. 
If the SP is changed, no PUSH or POP instructions are permitted until the second following instruction. 


If the N address register is changed with a MOVE instruction, this register’s contents will be available 
for use on the immediately following instruction. In this case the instruction that writes the N address 
register will be stretched one additional instruction cycle. This is true for the case when the N register 
is used by the immediately following instruction; if N is not used, then the instruction is not stretched 
an additional cycle. If the N address register is changed with a bit-field instruction, the new contents 
will not be available for use until the second following instruction. 
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MOVE(C) Move Control Register MOVE(C) 


Example: 


MOVE (C) LC,X0 ; move the LC register into the X0 register 
Before Execution After Execution 


LC 0100 i ee 7 0100 
X0 0123 X0 0100 
Explanation of Example: 


Execution of the MOVE(C) instruction moves the contents of the program controller’s 16-bit LC reg- 
ister into the data ALU’s 16-bit XO register. 


Example: 


MOVE (C) X:$CC00,N ; move X data memory value into the 
; N register 


Before Execution After Execution 
N 0123 N 0100 
Explanation of Example: 


Execution of the MOVE(C) instruction moves the contents of the X data memory at location $CC00 
into the AGU’s 16-bit N register. 


Example: 
MOVE (C) R2,X: (R3+$3072) ; move R2 register into X data 
7; memory 
Before Execution After Execution 


X:$4072 1234 oe X:$4072 AAAA 
R2 AAAA R2 AAAA 
Explanation of Example: 


Prior to execution, the contents of R3 is $1000. Execution of the MOVE(C) instruction moves the 
AGU’s 16-bit R2 register contents into the X data memory at the location $4072. 


Restrictions: 
A MOVE(C) instruction used within a DO loop that specifies the HWS as the source or that specifies 
the SR or HWS as the destination cannot begin at the LA-2, LA-1, or LA within that DO loop. 
A MOVE(C) instruction that specifies the HWS as the source or as the destination cannot be used im- 
mediately before a DO instruction. 
A MOVE(C) instruction that specifies the HWS as the source or that specifies the SR or HWS as the 
destination cannot be used immediately before an ENDDO instruction. 
A MOVE(C) instruction that specifies the SR, HWS, or SP as the destination cannot be used immedi- 
ately before an RTI or RTS instruction. 
A MOVE(C) HWS,HWS instruction is illegal and cannot be used. 
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MOVE(C) Move Control Register MOVE(C) 


Condition Codes Affected: 


LF) * |] * 7) * | * | * |} 0 /SZ}/L)E;UINI]Z)Vic 


If D is the SR: 
SZ — Set according to bit 7 of the source operand 
L — Set according to bit 6 of the source operand 
E — Set according to bit 5 of the source operand 
U — Set according to bit 4 of the source operand 
N — Set according to bit 3 of the source operand 
Z — Set according to bit 2 of the source operand 
V — Set according to bit 1 of the source operand 
C — Set according to bit 0 of the source operand 


If D1 and D2 are not SR: 


L — Set if data limiting has occurred during move 
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MOVE(C) 


Instruction Fields: 


Move Control Register 


MOVE(C) 


‘ Source or Source or 
Operavon Destination Destination eeriments 
MOVE(C) X:(Rn) Any register — 
X:(Rn)+ 
X:(Rn)- 
X:(Rn)+N 
X:(SP) 
X:(SP)+ 
X:(SP)- 
X:(SP)+N 
XIXXXX Any register 16-bit absolute address 
X:(Rn+N) Any register = 
X:(SP+N) 
X:(RN+Xxxx) Any register Signed 16-bit 
X:(SP+Xxxx) index 
X:(R2+xx) XO, Y1, YO, = 
X:(SP-xx) A, B, A1, B1 
RO-R3, N 
Any register Any register — 
Timing: 2 + mvc oscillator clock cycles 
Memory: 1 + ea program words 
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MOVE (lI) Move Immediate MOVE (lI) 


Operation: Assembler Syntax: 

#xx > D MOVE(]) #xx,D 

#xxxx > D MOVE() #XXXX,D 
#XXXX — X:<ea> MOVE()) #XXXX,X:<ea> 


Description: The 7-bit signed immediate operand is stored in the lowest 7 bits of the destination (D), and the upper 
bits are filled with sign extension. The destination can be any register, X data memory location, or 
on-chip peripheral register. 


Example: 
MOVE (IT) #<SFFC7, XO ; moves negative value into XO since bit 6 
so “des* 1 
Before Execution After Execution 
X0 1234 X0 FFC7 


Explanation of Example: 
Prior to execution, XO contains the value $1234. Execution of the instruction moves the value $FFC7 


into XO. 
Example: 
MOVE (TI) #$C33C,X:SA009 ; moves 16-bit value directly into a 
; memory location 
Before Execution After Execution 
X:$A009 1234 X:$A009 C33C 


Explanation of Example: 
Prior to execution, the X data memory location $A009 contains the value $1234. Execution of the in- 
struction moves the value $C33C into this memory location. 


Note: The MOVE(P) and MOVE(S) instructions also provide a mechanism for loading 16-bit immediate val- 
ues directly into the last 64 and first 64 locations, respectively, in X data memory. 


Condition Codes Affected: 
The condition codes are not affected by this instruction. 
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MOVE(I) 


Instruction Fields: 


Move Immediate 


MOVE(|) 


Operation Source Destination Cc W Comments 
MOVE #XX HHHH 2 1 Signed 7-bit integer data (data is put in the 
or lowest 7 bits of the word portion of any 
MOVEI accumulator, upper 8 bits and extension 
reg are sign extended, LSP portion is set 
to “O”) 
F#XXXX DDDDD 4 2 Signed 16-bit immediate data. When LC is 
the destination, use 13-bit values only. 
X:(R2+xx) 6 2 
X:(SP-xx) 6 2 
X:XXXX 6 3 
Timing: Refer to the preceding Instruction Fields table 
Memory: Refer to the preceding Instruction Fields table 
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MOVE(M) Move Program Memory MOVE(M) 


Operation: Assembler Syntax: 
P:<ea> > D MOVE(M) P:<ea>,D 
S— P:<ea> MOVE(M) S,P:<ea> 


Description: Move the specified register from or to the specified program memory location. The source register (S) 
and destination registers (D) are data ALU registers. 


When a 36-bit accumulator (A or B) is specified as a source operand, there is a possibility that the data 
may be limited. If the data out of the shifter indicates that the accumulator extension register is in use, 
and the data is to be moved into a 16-bit destination, the value stored in the destination is limited to a 
maximum positive or negative saturation constant to minimize truncation error. Limiting does not oc- 
cur if an individual 16-bit accumulator register (Al, AO, B1, or BO) is specified as a source operand 
instead of the full 36-bit accumulator (A or B). This limiting feature allows block floating-point oper- 
ations to be performed with error detection since the L bit in the CCR is latched (that is, sticky). 


When a 36-bit accumulator (A or B) is specified as a destination operand, any 16-bit source data to be 
moved into that accumulator is automatically extended to 36 bits by sign extending the MSB of the 
source operand (bit 15) and appending the source operand with 16 LS zeros. The automatic sign ex- 
tension and zeroing features may be circumvented by specifying the destination register to be one of 
the individual 16-bit accumulator registers (Al or B1). 


Example: 
MOVE (M) P: (R2)+N,A; move P:(R2) into A, update R2 with N 
Before Execution After Execution 
A 1234 5678 0 0116 0000 
A2 Al AO A2 Al AO 
P:$0077 0116 P:$0077 0116 
R2 $0077 R2 $007A 


Explanation of Example: 
Prior to execution, the 36-bit A accumulator contains the value $A:1234:5678, R2 contains the value 
$0077, the N register contains the value $0003, and the 16-bit program memory location P:(R2) con- 
tains the value $0116. Execution of the MOVE(M) instruction moves the 16-bit program memory lo- 
cation P:(R2) into the 36-bit A accumulator. R2 is then post-incremented by N. 
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MOVE(M) 


Condition Codes Affected: 


Move Program Memory 


MOVE(M) 


MR ri¢ CCR > 
15 14 13 12 #11 #10 9 8 7 4 3 2 1 0 
LF | * . . . i i 10 | SZ U N Z Vv; c 
L — Set if data limiting has occurred during the move 
Instruction Fields: 
Operation Source Destination Comments 
MOVE(M) P:(Rj)+ HHHH Read signed word from program 
P:(Rj)+N memory 
HHHH P:(Rj)+ Write word to program memory 
P:(Rj)+N 
Timing: 8 + mvm oscillator clock cycles 
Memory: 1 program word 
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MOVE(P) Move Peripheral Data MOVE( P) 


Operation: Assembler Syntax: 

X:<pp> > D MOVE(P) X:<pp>.D 

S > X:<pp> MOVE(P) S,X:<pp> 

#XXXX — X:<pp> MOVE(P) #XXXX,X!<pp> 

Description: Move the specified operand to or from a location in the last 64 words of the X data memory map. The 


Usage: 


Example: 


6-bit short absolute address is one-extended to generate a 16-bit address. 


When a 36-bit accumulator (A or B) is specified as a source operand, there is a possibility that the data 
may be limited. If the data out of the shifter indicates that the accumulator extension register is in use, 
and the data is to be moved into a 16-bit destination, the value stored in the destination is limited to a 
maximum positive or negative saturation constant to minimize truncation error. Limiting does not oc- 
cur if an individual 16-bit accumulator register (Al, AO, B1, or BO) is specified as a source operand 
instead of the full 36-bit accumulator (A or B). This limiting feature allows block floating-point oper- 
ations to be performed with error detection since the L bit in the CCR is latched (that is, sticky). 


When a 36-bit accumulator (A or B) is specified as a destination operand, any 16-bit source data to be 
moved into that accumulator is automatically extended to 36 bits by sign extending the MSB of the 
source operand (bit 15) and appending the source operand with 16 LS zeros. The automatic sign ex- 
tension and zeroing features may be circumvented by specifying the destination register to be one of 
the individual 16-bit accumulator registers (Al or B1). 


This MOVE(P) instruction provides a more efficient way of accessing the last 64 locations in X mem- 
ory, which may be allocated to memory-mapped peripheral registers. Consult the specific 
DSP56800-based device’s user manual for information on where in the memory map peripheral regis- 
ters are located. 


MOVEP R1,X:<SFFE2 ; write to location X:SFFE2 
Before Execution After Execution 
X:$FFE2 0123 ee X:$FFE2 5555 

R1 5555 R1 5555 


Explanation of Example: 


Example: 


Prior to execution, the location $FFE2 contains the value $0123. Execution of the 
MOVE (P) R1,X:<$FFE2 instruction moves the value $5555 contained in the R1 register into the 
location. 


MOVEP #$0342,X:<$24 ; moves 16-bit value into location SFFE4 
Before Execution After Execution 
X:$FFE4 AAAA X:$FFE4 0342 


Explanation of Example: 


A-124 


Prior to execution, the word at X data memory location $FFE4 contains the value $AAAA. The 
MOVEP one-extends the value $24 to form the address $FFE4. Execution of the instruction moves the 
value $0342 into this location. 
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MOVE(P) 


Condition Codes Affected: 


Move Peripheral Data 


MOVE(P) 


i¢ MR pie CCR > 
15 14 13 12 #11 #10 9 8 7 6 5 4 3 2 1 0 
LF 7 ™ - id ‘ 1 lo} sz|LJe U N Z vj) c 
L — Set if data limiting has occurred during move 
Note: It is also possible to access the last 64 locations in the X data memory map using the MOVE(C) in- 


struction, which can directly access these locations either using the address-register-indirect address- 
ing modes or the absolute address addressing mode, which specifies a 16-bit absolute address. 


Instruction Fields: 


Operation Source Destination Cc Comments 
MOVE(P) X:pp HHHH 2 Last 64 locations in data memory. 
HHHH X:pp 2 X:pp represents a 6-bit absolute I/O 
address. Refer to I/O Short 
Address (Direct Addressing): 
<pp> on page 4-23. 
Timing: 2 + ea oscillator clock cycles 
Memory: 1 + ea program words 
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MOVE(S) Move Absolute Short MOVE(S) 


Operation: Assembler Syntax: 

X:<aa> > D MOVE(S) X:<aa>,D 

S > X:<aa> MOVE(S) S,X:<aa> 

#XXXX — X:<aa> MOVE(S) #XXXX,X:<aa> 

Description: Move the specified operand from or to the first 64 memory locations in X data memory. The 6-bit ab- 


Example: 


solute short address is zero-extended to generate a 16-bit X data memory address. 


When a 36-bit accumulator (A or B) is specified as a source operand, there is a possibility that the data 
may be limited. If the data out of the shifter indicates that the accumulator extension register is in use, 
and the data is to be moved into a 16-bit destination, the value stored in the destination is limited to a 
maximum positive or negative saturation constant to minimize truncation error. Limiting does not oc- 
cur if an individual 16-bit accumulator register (Al, AO, B1, or BO) is specified as a source operand 
instead of the full 36-bit accumulator (A or B). This limiting feature allows block floating-point oper- 
ations to be performed with error detection since the L bit in the CCR is latched (that is, sticky). 


When a 36-bit accumulator (A or B) is specified as a destination operand, any 16-bit source data to be 
moved into that accumulator is automatically extended to 36 bits by sign extending the MSB of the 
source operand (bit 15) and appending the source operand with 16 LS zeros. The automatic sign ex- 
tension and zeroing features may be circumvented by specifying the destination register to be one of 
the individual 16-bit accumulator registers (Al or B1). 


MOVES X:<$0034,Y1 ; write to X:$0034 


Before Execution After Execution 


X:$0034 5555 See 5555 
Y1 0123 Y1 5555 


Explanation of Example: 


Example: 


Prior to execution, X:$0034 contains the value $5555 and Y1 contains the value $0123. Execution of 
the instruction moves the value $5555 into the Y1 register. 


MOVES #$0342,X:<S24; moves 16-bit value directly into 
; memory location 
Before Execution After Execution 
X:$0024 AAAA X:$0024 0342 


Explanation of Example: 


A-126 


Prior to execution, the contents of the X data memory location $0024 contains the value $AAAA. The 
MOVES zero-extends the value $24 to form the memory address $0024. Execution of the instruction 
moves the value $0342 into this location. 
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MOVE(S) 


Condition Codes Affected: 


Move Absolute Short 


MOVE(S) 


i¢ MR pie CCR > 
15 14 13 12 #11 #10 9 8 7 6 5 4 3 2 1 0 
LF | * : . i ‘ i lo |SZ}) Lb E U N Z vV|c 
SZ — Set according to the standard definition of the SZ bit 
L — Set if data limiting has occurred during move 
Note: It is also possible to access the first 64 locations in the X data memory using the MOVE(C) instruction, 


which can directly access these locations either using the address-register-indirect addressing modes 
or the absolute address addressing mode, which specifies a 16-bit absolute address. 


Instruction Fields: 


Operation Source Destination Cc Comments 
MOVE(S) X:aa HHHH 2 First 64 locations in data memory. 
HHHH X:aa 2 X:aa represents a 6-bit absolute 
address. Refer to Absolute Short 
Address (Direct Addressing): 
<aa> on page 4-22. 
Timing: 2 + ea oscillator clock cycles 
Memory: 1 + ea program words 
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MPY 


Signed Multiply MPY 


Operation: Assembler Syntax: 

+S1*S2—D (no parallel move) MPY (+)S1,S2,D (no parallel move) 

S1 * S2 4 D (one parallel move) MPY $1,S2,D (one parallel move) 

S1 * S2 > D (two parallel reads) MPY $1,S2,D (two parallel reads) 
Description: Multiply the two signed 16-bit source operands (S1 and S2) and store the product in the specified 36-bit 


6699 


destination accumulator (D). The sign option is used to negate the specified product. This option 
is not available when a single parallel move or two parallel read operations are performed or when D 
is the 16-bit XO, Y1, or YO. 


Usage: This instruction is used for multiplication of fractional data or integer data when a full 32-bit product 
is required (see Section 3.3.5.2, “Integer Multiplication,” on page 3-20). When the destination is a 
16-bit register, this instruction is useful only for fractional data. 

Example: 

MPY X0,Y1,A ; multiply X0O by Y1 
Before Execution After Execution 
0 1000 0000 F FA2B 0000 
A2 Al AO A2 Al AO 
X0 4000 XO 4000 
Y1 F456 Y1 F456 


Explanation of Example: 


Condition Co 


A-128 


Prior to execution, the 16-bit XO register contains the value $4000 (0.5), the 16-bit Y1 register contains 
the value $F456 (-0.0911255), and the 36-bit A accumulator contains the value $00:1000:0000 
(0.125). Execution of the MPY XQ, Y1,A instruction multiplies the 16-bit signed value in the XO reg- 
ister by the 16-bit signed value in Y1 and stores the result ($F:FA2B:0000) into the A accumulator 
(X0 * Y1 = -0.045562744140625). 


des Affected: 


Let MR Pid CCR > 
15 14 13 12 11 10 9 8/7 6 5 4 3 2 1 0 


LF; *);* 7 * 7) * 7 * |W )lO;SZ)L)E;UIN)Z]Vic 


— Set according to the standard definition of the SZ bit (parallel move) 
— Set if limiting (parallel move) or overflow (result) has occurred 

— Set if the signed integer portion of A or B result is in use 

Set according to the standard definition of the U bit 

— Set if bit 35 of A or B result is set except during saturation 

— Setif A or B result equals zero 

— Set if overflow has occurred in A or B result 


<NZCMCe 
| 


See Section 3.6.5, “16-Bit Destinations,” on page 3-35 for cases with X0, YO, or Y1 as D. 
See Section 3.6.2, “36-Bit Destinations—CC Bit Set,” on page 3-34 and Section 3.6.4, “20-Bit Desti- 
nations—CC Bit Set,” on page 3-34 for the case when the CC bit is set. 
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MPY Signed Multiply MPY 


Instruction Fields: 


Operation Operands Cc W Comments 
MPY (+)Y1,X0,FDD 2 1 Fractional multiply where one operand is optionally 
(+) Y0,X0,FDD negated before multiplication 
(+)Y1,Y0,FDD 
(+)Y0,Y0,FDD Note: Assembler also accepts first two operands 
(+)A1,Y0,FDD when they are specified in opposite order 
(+)B1,Y1,FDD 
Data ALU Operation Parallel Memory Read or Write 
Operation Registers Memory Access Source or Destination 
MPY Y1,B1,F X:(Rn)+ XO 
YO, Y0,F X:(Rn)+N Y1 
Y0,A1,F YO 
X0,Y0,F A 
X0,Y1,F B 
YO,Y1,F Al 
B1 
(F = AorB) 
Data ALU First and Second Memory Destinations for Memory 
Operation Reads Reads 
Operation Registers Read1 Read2 Destination1 Destination2 
MPY YO,X0,F X:(RO)+ X:(R3)+ YO X0 
Y1,X0,F X:(RO)+N X:(R3)- 
Y1,Y0,F v1 x0 
X:(R1)+ ; : 
: Valid Valid 
Pete) AHN destinations destinations 
for Read for Read2 
Timing: 2 + mv oscillator clock cycles for MPY instructions with a parallel move 


Refer to previous table for MPY instructions without a parallel move 


Memory: 1 program word for MPY instructions with a parallel move 
Refer to previous table for MPY instructions without a parallel move 
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MPYR Signed Multiply and Round MPYR 


Operation: Assembler Syntax: 
+S1*S2+r—D (no parallel move) MPYR (+)S1,S2,D (no parallel move) 
S1 * S2+r-— D (two parallel reads) MPYR $1,S2,D (two parallel reads) 


Description: Multiply the two signed 16-bit source operands (S1 and S2), round the result using the specified round- 
ing, and store it in the specified 36-bit destination accumulator (D). (Refer to RND for more complete 
information on the convergent rounding process.) The “-” sign option is used to negate the specified 
product. The default sign option is “+”. 


Usage: This instruction is used for multiplication and rounding of fractional data. 
Example: 
MPYR -X0,Y1,A ; multiply XO by Y1 and negate the product 
Before Execution After Execution 
0 1000 1234 F FE8B 0000 
A2 Al AO A2 Al AO 
X0 4000 X0 4000 
Y1 F456 Y1 F456 


Explanation of Example: 
Prior to execution, the 16-bit XO register contains the value $4000 (0.5), the 16-bit Y1 register contains 
the value $F456 (-0.0911255), and the 36-bit A accumulator contains the value $00:1000:1234 
(0.125002169981599). Execution of the MPYR -X0,Y1,A instruction multiplies the 16-bit signed 
value in the XO register by the 16-bit signed value in Y1, rounds the result, and stores the result 
($FF:FE8B:0000) into the A accumulator (-X0 * Y1 = -0.011383056640625). In this example, the de- 
fault rounding (convergent rounding) is performed. 


Condition Codes Affected: 


i MR r¢ CCR >| 
15 14 13 12 11 10 9 8/7 6 5 4 3 2 1 = 0 


LF); * |] * 7 * | * | * |} 1] /SZ)/L]/E;]UJN{Z/]Vic 


— Set according to the standard definition of the SZ bit (parallel move) 
— Set if limiting (parallel move) or overflow has occurred in result 

— Set if the signed integer portion of A or B result is in use 

Set according to the standard definition of the U bit 

— Set if bit 35 of A or B result is set except during saturation 

— Setif A or B result equals zero 

— Set if overflow has occurred in A or B result 


<NZCMre 
| 


See Section 3.6.5, “16-Bit Destinations,” on page 3-35 for cases with XO, YO, or Y1 as D. 
See Section 3.6.2, “36-Bit Destinations—CC Bit Set,” on page 3-34 and Section 3.6.4, “20-Bit Desti- 
nations—CC Bit Set,” on page 3-34 for the case when the CC bit is set. 
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MPYR MPYR 


Instruction Fields: 


Signed Multiply and Round 


Operation Operands Cc W Comments 
MPYR (+)Y¥1,X0,FDD 2 1 Fractional multiply where one operand is optionally 
(+) Y0,X0,FDD negated before multiplication; result is rounded 
(+)Y1,Y0,FDD 
(+) Y0,Y0,FDD Note: Assembler also accepts first two operands 
(+)A1,Y0,FDD when they are specified in opposite order 
(+)B1,Y1,FDD 
Data ALU Operation Parallel Memory Read or Write 
Operation Registers Memory Access Source or Destination 
MPYR Y1,B1,F X:(Rn)+ X0 
YO, Y0,F X:(Rn)+N Y1 
Y0,A1,F YO 
X0,Y0,F A 
X0,Y1,F B 
YO,Y1,F Al 
B1 
(F = AorB) 
Data ALU First and Second Memory Destinations for Memory 
Operation Reads Reads 
Operation Registers Read1 Read2 Destination1 Destination2 
MPYR YO,X0,F X:(RO)+ X:(R3)+ YO X0 
Y1,X0,F X:(RO)+N X:(R3)- 
Y1,Y0,F v1 x0 
X:(R1)+ ; : 
: Valid Valid 
Petes) AHN destinations destinations 
for Read for Read2 
Timing: 2 + mv oscillator clock cycles for MPYR instructions with a parallel move 


Refer to previous table for MPYR instructions without a parallel move 


Memory: 


1 program word for MPYR instructions with a parallel move 


Refer to previous table for MPYR instructions without a parallel move 
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A-131 


MPYSU Signed Unsigned Multiply MPYSU 


Operation: 
S1 * $2 > D(S1 signed, S2 unsigned) MPYSU $1,S2,D (no parallel move) 


Assembler Syntax: 


Description: Multiply the two 16-bit source operands (S1 and $2), and store the product in the specified 36-bit des- 


Usage: 


Example: 


tination accumulator (D). S1 can be unsigned; S2 is always considered unsigned. This mixed arith- 
metic multiply does not allow a parallel move and can be used for multi-precision multiplications. 


In addition to single-precision multiplication of a signed value times unsigned value, this instruction 
is also used for multi-precision multiplications, as shown in Section 3.3.8.2, “Multi-Precision Multi- 
plication,” on page 3-23. 


MPYSU x0,YO,A 
Before Execution After Execution 
0 0000 0000 0 3456 0000 
A2 Al AO A2 Al AO 
X0 3456 X0 3456 
Yo 8000 YO 8000 


Explanation of Example: 


A-132 


The 16-bit XO register contains the value $3456, and the 16-bit Y1 register contains the value $8000. 
Execution of the MPYSU XO, YO,A instruction multiplies the 16-bit signed value in the XO register 
by the 16-bit unsigned value in YO and stores the signed result into the A accumulator. If this was a 
MPY instruction, YO ($8000) would equal -1.0, and the multiplication result would be 
$F:CBAA:0000. Since this is a MPYSU instruction, YO is considered unsigned and equals +1.0. This 
gives a multiplication result of $0:3456:0000. 
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MPYSU 


Condition Codes Affected: 


Signed Unsigned Multiply 


i¢ MR pie CCR > 
15 14 13 12 11 10 9 8|7 6 5 4 3 2 1 #O 
LF} * * * - - ih 10 |} SZ} oL E;U|N/Z/|Vic 
E — Set if the signed integer portion of A or B result is in use 
U — Set according to the standard definition of the U bit 
N — Set if bit 35 of A or B result is set except during saturation 
Z — Setif A or B result equals zero 
V — Set if overflow has occurred in A or B result 


See Section 3.6.5, “16-Bit Destinations,” on page 3-35 for cases with X0, YO, or Y1 as D. 
See Section 3.6.2, “36-Bit Destinations—CC Bit Set,” on page 3-34 and Section 3.6.4, “20-Bit Desti- 
nations—CC Bit Set,” on page 3-34 for the case when the CC bit is set. 


Instruction Fields: 
Operation Operands Cc W Comments 
MPYSU X0,Y1,FDD 2 1 Signed or unsigned 16x16 fractional multiply with 

X0,Y0,FDD 32-bit result. 
Y0,Y1,FDD 
Y0,Y0,FDD The first operand is treated as signed and the sec- 
Y0,A1,FDD ond as unsigned. 
Y1,B1,FDD 

Timing: 2 oscillator clock cycles 

Memory: 1 program word 
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N EG Negate Accumulator N EG 


Operation: Assembler Syntax: 
0-D-> D (parallel move) NEG D (parallel move) 


Description: The destination operand (D) is subtracted from zero, and the two’s complement result is stored in the 
destination accumulator. 


Usage: This instruction is used for negating a 36-bit accumulator. It can also be used to negate a 16-bit value 
loaded in the MSP of an accumulator if the LSP of the accumulator is $0000 (see Section 8.1.6, “Un- 
signed Load of an Accumulator,” on page 8-7). 


Example: 
NEG B X0,X:(R3)+; O-B — B, save X0, update R3 
Before Execution After Execution 
0 1234 5678 F EDCB A988 
B2 B1 BO B2 B1 BO 
SR 0300 SR 0309 


Explanation of Example: 
Prior to execution, the 36-bit B accumulator contains the value $0:1234:5678. The NEG B instruction 
takes the two’s-complement of the value in the B accumulator and stores the 36-bit result back in the 
B accumulator. 


Condition Codes Affected: 


I MR ri¢ CCR >| 
15 14 13 12 11 10 9 8/7 6 5 4 3 2 1 +0 


LF); *) *] * 7] * ) * | 1} 10 }SZ;) LSE; USN] Zi] Vic 


— Set according to the standard definition of the SZ bit (parallel move) 
— Set if limiting (parallel move) or overflow has occurred in result 

— Set if the signed integer portion of A or B is in use 

Set according to the standard definition of the U bit 

— Set if bit 35 of A or B result is set except during saturation 

— Setif A or B result equals zero 

— Set if overflow has occurred in A or B result 

— Set if a borrow is generated from the MSB of the result 


O<NZCmrYe 
| 


See Section 3.6.2, “36-Bit Destinations—CC Bit Set,” on page 3-34 and Section 3.6.4, “20-Bit Desti- 
nations—CC Bit Set,” on page 3-34 for the case when the CC bit is set. 
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NEG 


Instruction Fields: 


Negate Accumulator 


NEG 


Operation Operands W Comments 
NEG F 1 Two’s-complement negation 
Data ALU Operation Parallel Memory Read or Write 
Operation Registers Memory Access Source or Destination 
NEG A X:(Rn)+ XO 
B X:(Rn)+N Y1 
YO 
Al 
B1 
A 
B 
Timing: 2 + mv oscillator clock cycles 
Memory: 1 program word 
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NOP No Operation NOP 


Operation: Assembler Syntax: 
PC+1 > PC NOP 


Description: Increment the PC. Pending pipeline actions, if any, are completed. Execution continues with the in- 
struction following the NOP. 


Example: 
NOP ; increment the program counter 


Explanation of Example: 
The NOP instruction increments the PC and completes any pending pipeline actions. 


Condition Codes Affected: 
The condition codes are not affected by this instruction. 


Instruction Fields: 


Operation Operands Cc W Comments 
NOP 2 1 No operation 
Timing: 2 oscillator clock cycles 
Memory: 1 program word 
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NO RM Normalize Accumulator Iteration NORM 


Operation: Assembler Syntax: 

If (E*U*Z=1) NORM RO,D (no parallel move) 
then ASL D and Rn- 1 > Rn 

else if (E = 1) 
then ASR D and Rn+ 1 Rn 

else NOP 


where X denotes the logical complement of X and 
where * denotes the logical AND operator 


Description: Perform one normalization iteration on the specified destination operand (D), update the address reg- 


Example: 


ister RO based upon the results of that iteration, and store the result back in the destination accumulator. 
This is a 36-bit operation. If the accumulator extension is not in use, the accumulator is unnormalized, 
and the accumulator is not zero, then the destination operand is arithmetically shifted 1 bit to the left, 
and the specified address register is decremented by one. If the accumulator extension register is in use, 
the destination operand is arithmetically shifted 1 bit to the right, and the specified address register is 
incremented by one. If the accumulator is normalized or zero, a NOP is executed, and the specified 
address register is not affected. Since the operation of the NORM instruction depends on the E, U, and 
Z CCR bits, these bits must correctly reflect the current state of the destination accumulator prior to 
executing the NORM instruction. The L and V bits in the CCR will be cleared unless they have been 
improperly set up prior to executing the NORM instruction. 


TST A 
REP #31 ; maximum number of iterations (31) needed 
NORM RO,A ; perform one normalization iteration 
Before Execution After Execution 
0 0000 8000 0 4000 0000 
A2 Al AO A2 Al AO 
RO 0000 RO FFF 1 


Explanation of Example: 


Prior to execution, the 36-bit A accumulator contains the value $0:0000:8000, and the 16-bit RO ad- 
dress register contains the value $0000. The repetition of the NORM RO, A instruction normalizes the 
value in the 36-bit accumulator and stores the resulting number of shifts performed during that normal- 
ization process in the RO address register. A negative value reflects the number of left shifts performed, 
while a positive value reflects the number of right shifts performed during the normalization process. 
In this example, 15 left shifts are required for normalization. 
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NO RM Normalize Accumulator Iteration NORM 


Condition Codes Affected: 


¢ MR ri¢ CCR >| 
15 14 13 12 11 10 9 8/7 6 5 4 3 2 1 = 0 


LF; *) *] * 7) * ) * 1 1} 10 }SZ; LSE}; USN] ZI] VC 


— Set if overflow has occurred in A or B result 

— Set if the signed integer portion of A or B result is in use 
— Set according to the standard definition of the U bit 

Set if bit 35 of A or B result is set except during saturation 
— Setif A or B result equals zero 

— Set if bit 35 is changed as a result of a left shift 


<NZOmC 
| 


See Section 3.6.2, “36-Bit Destinations—CC Bit Set,” on page 3-34 and Section 3.6.4, “20-Bit Desti- 
nations—CC Bit Set,” on page 3-34 for the case when the CC bit is set. 


Instruction Fields: 


Operation Operands Cc W Comments 


NORM RO,F 2 1 Normalization iteration instruction for normalizing 
the F accumulator 


Timing: 2 oscillator clock cycles 


Memory: 1 program word 
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NOT Logical Complement NOT 


Operation: Assembler Syntax: 
D—D (no parallel move) NOT D (no parallel move) 
D[31:16] — D[31:16] (no parallel move) NOT D (no parallel move) 


where the bar over the D (D) denotes the logical NOT operator 


Description: Take the one’s-complement of the destination operand (D) and store the result in the destination. This 
instruction is a 16-bit operation. If the destination is a 36-bit accumulator, the one’s-complement is 
performed on bits 31—16 of the accumulator. The remaining bits of the destination accumulator are not 


affected. 
Example: 
NOT A A,X: (R2)+ 7 save Al and take the 1’s complement of Al 
Before Execution After Execution 
5 1234 5678 5 EDCB 5678 
A2 Al AO A2 Al AO 
SR 0300 SR 0300 


Explanation of Example: 
Prior to execution, the 36-bit A accumulator contains the value $5:1234:5678. The NOT A instruction 
takes the one’s-complement of bits 31-16 of the A accumulator (A1) and stores the result back in the 
Al register. The remaining A accumulator bits are not affected. 


Condition Codes Affected: 


¢ MR ri CCR >| 
15 14 13 12 11 10 9 8/7 6 5 4 3 2 1 = 0 


LFF; *) * ] * 7] * ) * 1 1} 10 ;SZ; LP E}; Uy; NY] ZI] VC 


N — Setif bit 31 of A or B result is set 
Z — Set if bits 31-16 of A orB result are zero 
V — Always cleared 


See Section 3.6.5, “16-Bit Destinations,” on page 3-35 for cases with X0, YO, or Y1 as D. 
See Section 3.6.2, “36-Bit Destinations—CC Bit Set,” on page 3-34 and Section 3.6.4, “20-Bit Desti- 
nations—CC Bit Set,” on page 3-34 for the case when the CC bit is set. 


Instruction Fields: 


Operation Operands Cc W Comments 
NOT FDD 2 1 One’s-complement (bit-wise negation) 
Timing: 2 oscillator clock cycles 
Memory: 1 program word 
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NOTC Logical Complement with Carry NOTC 


Operation: Assembler Syntax: 
X:<ea> — X:(ea) NOTC X:<ea> 
DD NOTC D 


Implementation Note: 
This instruction is an alias to the BFCHG instruction, and assembles as BFCHG with the 16-bit imme- 
diate mask set to $FFFF. This instruction will disassemble as a BFCHG instruction. 


Description: Take the one’s complement of the destination operand (D), and store the result in the destination. This 
instruction is a 16-bit operation. If the destination is a 36-bit accumulator, the one’s-complement is 
performed on bits 31—16 of the accumulator. The remaining bits of the destination accumulator are not 
affected. C is also modified as described in following discussion. 


Example: 
NOTC R2 
Before Execution After Execution 
R2 CAA3 R2 355C 
SR 3456 SR 3456 


Explanation of Example: 
Prior to execution, the R2 register contains the value $CAA3. Execution of the instruction comple- 
ments the value in R2. C is modified as described in following discussion. 


Condition Codes Affected: 


¢ MR ri¢ CCR >| 
15 14 13 12 11 10 9 8/7 6 5 4 3 2 1 = 0 


LF; *] * ] * 7) * ) * FPO ;SZ; LP EF; USN ZI] VAC 


For dcstinsaon operand SR: 
— Changed if specified in the field 
For other seaenatn operands: 
C — Set if the value equals $FFFF before the complement 
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NOTC 


Instruction Fields: 


Logical Complement with Carry NOTC 


Operation Operands Cc W Comments 
NOTC DDDDD 4 2 One’s-complement (bit-wise negation). 
X:(R2+xx) 6 2 All registers in DDDDD are permitted except HWS. 
X:(SP-xx) 6 2 | X:aa represents a 6-bit absolute address. Refer to 
Absolute Short Address (Direct Addressing): 
Xvaa 4 2 | <aa> on page 4-22. 
X:pp 4 2 | X:pp represents a 6-bit absolute I/O address. Refer 
ae 6 3 to I/O Short Address (Direct Addressing): <pp> 
on page 4-23. 
Timing: Refer to the preceding Instruction Fields table 
Memory: Refer to the preceding Instruction Fields table 
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OR Logical Inclusive OR OR 


Operation: Assembler Syntax: 
S+D—->D (no parallel move) OR S,D (no parallel move) 
S + D[31:16] > D[31:16] (no parallel move) OR S,D (no parallel move) 


where + denotes the logical inclusive OR operator 


Description: Logically OR the source operand (S) with the destination operand (D) and store the result in the desti- 
nation. This instruction is a 16-bit operation. If the destination is a 36-bit accumulator, the source is 
ORed with bits 31-16 of the accumulator. The remaining bits of the destination accumulator are not 


affected. 
Usage: This instruction is used for the logical OR of two registers. If it is desired to OR a 16-bit immediate 
value with a register or memory location, then the ORC instruction is appropriate. 
Example: 
OR YTB ; OR Y1 with B 
Before Execution After Execution 
0 1234 5678 0 FF34 5678 
B2 Bi BO B2 Bi BO 
Al FFOO Y1 FFOO 


Explanation of Example: 
Prior to execution, the 16-bit Y1 register contains the value $FFOO, and the 36-bit B accumulator con- 
tains the value $0:1234:5678. The OR Y1,B instruction logically ORs the 16-bit value in the Y1 reg- 
ister with B1 and stores the 36-bit result in the B accumulator. 


Condition Codes Affected: 


Let MR pid CCR > 
15 14 13 12 1110 9 8/7 6 5 4 3 2 1 #0 


LF; *) * ] * Pe) * | P10 ;SZ; LP EY} USN] ZI] VC 


N — Setif bit 31 of A or B result is set 
Z — Set if bits 31-16 of A orB result are zero 
V — Always cleared 
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OR 


Instruction Fields: 


Logical Inclusive OR 


OR 


Operation Operands Cc W Comments 
OR DD,FDD 2 1 16-bit logical OR 
F1,DD 
Timing: 2 oscillator clock cycles 
Memory: 1 program word 
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ORC Logical Inclusive OR Immediate ORC 


Operation: Assembler Syntax: 
#XXXX + X:<ea> > X:<ea> ORC #i111,X:<ea> 
#xxxx + D 3D ORC #iiii.D 


where + denotes the logical inclusive OR operator 


Implementation Note: 
This instruction is an alias to the BFSET instruction, and assembles as BFSET with the 16-bit imme- 
diate value used as the bit mask. This instruction will disassemble as a BFSET instruction. 


Description: Logically OR a 16-bit immediate data value with the destination operand (D) and store the results back 
into the destination. C is also modified as described in following discussion. This instruction performs 
a read-modify-write operation on the destination and requires two destination accesses. 


Example: 
ORC #$5050,X:<<$7C30; OR with immediate data 
Before Execution After Execution 
X:$7C30 OOAA X:$7C30 50FA 
SR 0300 SR 0300 


Explanation of Example: 
Prior to execution, the 16-bit X memory location X:$7C30 contains the value $00AA. Execution of the 
instruction tests the state of bits 14, 12, 6, and 4 in X:$7C30; does not set C (because all these bits were 
not set); and then sets the bits. 


Condition Codes Affected: 


\¢ MR ri¢ CCR >| 
15 14 13 12 1110 9 8/7 6 5 4 3 2 1 +0 


LF; *) * ] * 7) * ) * |} 10 }SZ; LP EE}; USN] ZI] Vc 


For destination operand SR: 

— Set as defined in the field and if specified in the field 
For other destination operands: 

— Set if all bits specified by the mask are set 
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ORC 


Instruction Fields: 


Logical Inclusive OR Immediate ORC 


Operation Operands Cc W Comments 
ORC #xxxx, DDDDD 4 2 16-bit logical OR of immediate data. 
#XXXX, X:(R2+Xx) 6 2 All registers in DDDDD are permitted except HWS. 
#XxxX,X:(SP-xx) 6 2 | X:aa represents a 6-bit absolute address. Refer to 
Absolute Short Address (Direct Addressing): 
#XXXX,X:aa 4 2 <aa> on page 4-22. 
FXXXX,X'PP 4 2 | X:pp represents a 6-bit absolute I/O address. Refer 
oe 6 3 to I/O Short Address (Direct Addressing): <pp> 
on page 4-23. 
Timing: Refer to the preceding Instruction Fields table 
Memory: Refer to the preceding Instruction Fields table 
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POP Pop from Stack POP 


Operation: Assembler Syntax: 
X(SP)> D POP D 
SP-l > SP 


Description: Read one location from the software stack into a destination register (D) and post-decrement the SP. 


Implementation Note: 
This instruction is implemented by the assembler using either a MOVE or LEA instruction, depending 
on the form. When a destination register is specified,a MOVE (SP)-,<register> instruction is 
assembled. When no destination register is specified, POP assembles as LEA (SP) -. The instruction 
will always disassemble as either MOVE or LEA. 


Example: 
POP LG 
Before Execution After Execution 
X:$0100 AAAA X:$0100 AAAA 
LC 0099 LC AAAA 
SP 0100 SP OOFF 


Explanation of Example: 
Prior to execution, the LC register contains the value $0099, and the SP contains the value $0100. The 
POP instruction reads from the location in X data memory pointed to by the SP and places this value 
in the LC register. The SP is then decremented after the read from memory. 


Condition Codes Affected: 


The condition codes are not affected by this instruction. 


Instruction Fields: 


Operation Operands Cc W Comments 
POP Any register 2 1 Pop a single stack location 
(No register 2 1 Simply decrements the SP 
specified) 
Timing: 2 oscillator clock cycles 
Memory: 1 program word 
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REP Repeat Next Instruction REP 


Operation: Assembler Syntax: 
LC > TEMP; #xx ~ LC REP #XX 
Repeat next instruction until LC = 1 

TEMP > LC 

LC > TEMP; S—LC REP S 
Repeat next instruction until LC = 1 

TEMP > LC 


Description: Repeat the single word instruction immediately following the REP instruction the specified number of 
times. The value specifying the number of times the given instruction is to be repeated is loaded into 
the 13-bit LC register. The contents of the 13-bit LC register are treated as unsigned (that is, always 
positive). The single word instruction is then executed the specified number of times, decrementing 
the LC after each execution until LC equals one. When the REP instruction is in effect, the repeated 
instruction is fetched only one time, and it remains in the instruction register for the duration of the 
loop count. Thus, the REP instruction is not interruptible. The contents of the LC register upon entering 
the REP instruction are stored in an internal temporary register and are restored into the LC register 
upon exiting the REP loop. If LC is set equal to zero, the instruction is not repeated and execution con- 
tinues with the instruction immediately following the instruction that was to be repeated. The instruc- 
tion’s effective address specifies the address of the value that is to be loaded into the LC. 


The REP instruction allows all registers on the DSP core to specify the number of loop iterations except 
for the following: M01, HWS, OMR, and SR. If immediate short data is instead used to specify the 
loop count, the 6 LSBs of the LC register are loaded from the instruction and the upper 7 MSBs are 
cleared. 


Note: If the A or B accumulator is specified as a source operand, and the data out of the accumulator indicates 
that extension is in use, the value to be loaded into the LC register will be limited to a 16-bit maximum 
positive or negative saturation constant. If positive saturation occurs, the limiter places $7FFF onto the 
bus, and the lower 13 bits of this value are all ones. The 13 ones are loaded into the LC register as the 
maximum unsigned positive loop count allowed. If negative saturation occurs, the limiter places $8000 
onto the bus, and the lower 13 bits of this value are all zeros. The 13 zeros are loaded into the LC reg- 
ister, specifying a loop count of zero. The A and B accumulators remain unchanged. 


Note: Once in progress, the REP instruction and the REP loop may not be interrupted until completion of the 
REP loop. 


Restrictions: 
The REP instruction can repeat any single word instruction except the REP instruction itself and any 
instruction that changes program flow. The following instructions are not allowed to follow a REP in- 
struction: 


Any instruction that occupies multiple words 


DO Bec, Jcc 
BRCLR, BRSET BRA, JMP 
MOVEM JSR 

REP RTI 

RTS STOP, WAIT 


SWI,DEBUG ~ Tcc 
Also, a REP instruction cannot be the last instruction in a DO loop (at the LA). The assembler will 


generate an error if any of the preceding instructions are found immediately following a REP instruc- 
tion. 
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REP 


Example: 


REP 
INCW 


Repeat Next Instruction 


X0 
Yel: 


Before Execution 


X0 


Y1 


LC 


Explanation of Example: 


; repeat (X00) times 


; increment the Yl register 


After Execution 


0003 X0 
0000 Y1 
O00A5 LC 


0003 


0003 


O0AS 


REP 


Prior to execution, the 16-bit XO register contains the value $0003, and the 16-bit LC register contains 


the value $00A5. Execution of the RI 


EP XO instruction takes the lower 13 bits of the value in the XO 


register and stores it in the 13-bit LC register. Then, the single word INCW instruction immediately 
following the REP instruction is repeated $0003 times. The contents of the LC register before the REP 
loop are restored upon exiting the REP loop. 


Example: 


Before Execution 


X0 


Y1 


LC 


Explanation of Example: 


; repeat (X00) times 


; increment the Yl register 


; multiply the Yl register by 2 


After Execution 


0000 X0 
0005 Y1 
O0A5S LC 


0000 


000A 


O0A5 


Prior to execution, the 16-bit XO register contains the value $0000, and the 16-bit LC register contains 


the value $00A5. Execution of the RI 


EP XO instruction takes the lower 13 bits of the value in the XO 


register and stores it in the 13-bit LC register. Since the loop count is zero, the single word INCW in- 
struction immediately following the REP instruction is skipped and execution continues with the ASL 
instruction. The contents of the LC register before the REP loop are restored upon exiting the REP 


loop. 
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REP 


Repeat Next Instruction 


Condition Codes Affected: 


REP 


ie MR ri< CCR > 
15 14 13 12 11 10 9 8/7 6 5 4 3 2 1 #0 
LF) *}* | * | * ) * 77 lO ;SZ})bL}E;UyINI]Z]VIC 
L — Set if data limiting occurred using A or B as source operands 
Instruction Fields: 
Operation Operands Cc W Comments 
REP #XX 6 1 Hardware repeat of a one-word instruction with 
immediate loop count 
DDDDD 6 1 Hardware repeat of a one-word instruction with loop 
count specified in register 
Any register allowed except: SP, M01, SR, OMR, 
and HWS 
Timing: 6 oscillator clock cycles 
Memory: 1 program word 
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RND Round Accumulator RND 


Operation: Assembler Syntax: 
D+r-> D (parallel move) RND D (parallel move) 


Description: Round the 36-bit value in the specified destination operand (D), store the result in the EXT and MSPs 
of the destination accumulator (A2:A1 or B2:B1), and clear the LSP of the accumulator. This instruc- 
tion uses the rounding technique selected by the R bit in the OMR. When the R bit in OMR is cleared 
(default mode), convergent rounding is selected; when the R bit is set, two’s-complement rounding is 
selected. The rounding constant is added into bit 15 of the destination. Refer to Section 3.5, “Round- 
ing,” on page 3-30 for more information about the rounding modes. 


Example: 
RND A ; round A accumulator into A2:Al, zero AO 
Before Execution After Execution 
| 5 1236 789A 2 1236 0000 
A2 Al AO A2 Al AO 
Before Execution After Execution 
Il 0 1236 8000 0 1236 0000 
A2 Al AO A2 Al AO 
Before Execution After Execution 
il} 0 1235 8000 0 1236 0000 
A2 Al AO A2 Al AO 


Explanation of Example: 
Prior to execution, the 36-bit A accumulator contains the value $5:1236:789A for Case I, the value 
$0:1236:8000 for Case II and the value $0:1235:8000 for Case III. Execution of the RND A instruction 
rounds the value in the A accumulator into the MSP of the A accumulator (A1) and then zeros the LSP 
of the A accumulator (AO). The example is given assuming that the convergent rounding is selected. 
Case II is the special case that distinguishes convergent rounding from the two’s-complement round- 
ing, since it clears the LSB of the MSP after the rounding operation is performed. 
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RND 


Round Accumulator RND 


Condition Codes Affected: 


i 


MR ri¢ CCR >| 


15 14 13 12 11 10 9 8/|7 6 5 4 3 2 1 O 


LF; *) *] * 7) * ) * | 11710 }SZ;) LE}; USN] Zi] VIC 


<NZCmMS 


— Set according to the standard definition of the SZ bit (parallel move) 
— Set if limiting (parallel move) or overflow has occurred in result 

— Set if the signed integer portion of A or B result is in use 

— Set according to the standard definition of the U bit 

— Set if bit 35 of A or B result is set except during saturation 

— Setif A or B result equals zero 

— Set if overflow has occurred in A or B result 


Note: If the CC bit is set and bit 31 of the result is set, then N is set. If the CC bit is set and bits 31—0 of the 
result equal zero, then Z is set. The rest of the bits are unaffected by the setting of the CC bit. 


Instruction Fields: 


Operation Operands Cc W Comments 
RND F 2 1 Round 
Data ALU Operation Parallel Memory Read or Write 
Operation Registers Memory Access Source or Destination 
RND A X:(Rn)+ X0 
B X:(Rn)+N Y1 
YO 
Al 
B1 
A 
B 
Timing: 2 + mv oscillator clock cycles 
Memory: 1 program word 
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ROL 


Rotate Left 


Operation: Assembler Syntax: 

(see following figure) ROL 
C<«— |Unch. <— Unchanged (parallel move) 
D2 Di Do 


Description: Logically shift 16 bits of the destination operand (D) | bit to the left, and store the result in the desti- 
nation. If the destination is a 36-bit accumulator, the result is stored in the MSP of the accumulator (A1 
or B1), and the remaining portions of the accumulator (A2, B2, AO, and BO) are not modified. The 
MSB of the destination (bit 31 if the destination is a 36-bit accumulator) prior to the execution of the 
instruction is shifted into C, and the previous value of C is shifted into the LSB of the destination (bit 
16 if the destination is a 36-bit accumulator). 


Example: 


ROL A ; rotate Al left 1 bit 


Before Execution After Execution 


F 0000 OOAA F 0001 OOAA 
B2 Bi BO B2 Bi BO 
SR 0001 SR 0000 


Explanation of Example: 
Prior to execution, the 36-bit A accumulator contains the value $F:0001:00AA. Execution of the 
ROL A instruction shifts the 16-bit value in the Al register 1 bit to the left, shifting bit 31 into C, ro- 
tating C into bit 16, and storing the result back in the A1 register. 
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ROL 


Condition Codes Affected: 


Rotate Left 


ROL 


ie MR ri< CCR > 
15 14 13 12 11 10 9 8/7 6 5 4 2 1 0 
LF) *}* | * | * ) * 7 it 7 lO ;SZ}L}E;Uy;NIZ}ViEC 
N — Set if bit 31 of A or B result is set 
Z — Set if bits 31-16 of A orB result are zero 
V — Always cleared 
C — Set if bit 31 of A or B was set prior to the execution of the instruction 
Instruction Fields: 
Operation Operands Cc W Comments 
ROL FDD 2 1 Rotate 16-bit register left by 1 bit through the carry 
bit 
Timing: 2 oscillator clock cycles 
Memory: 1 program word 
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ROR 


Rotate Right 


ROR 


Operation: Assembler Syntax: 
(see following figure) ROR D 
i 
ani Unch. —> Unchanged (parallel move) 
t D2 D1 DO 


Description: Logically shift 16 bits of the destination operand (D) 1 bit to the right and store the result in the desti- 
nation. If the destination is a 36-bit accumulator, the result is stored in the MSP of the accumulator (A1 
or B1), and the remaining portions of the accumulator (A2, B2, AO, and BO) are not modified. The LSB 
of the destination (bit 16 if the destination is a 36-bit accumulator) prior to the execution of the instruc- 
tion is shifted into C, and the previous value of C is shifted into the MSB of the destination (bit 31 if 
the destination is a 36-bit accumulator). 


Example: 
ROR 


Before Execution 


; rotate 


After Execution 


Bl right 1 bit 


F 0001 OOAA F 0000 OOAA 
B2 Bi BO B2 Bi BO 
SR 0000 SR 0005 


Explanation of Example: 
Prior to execution, the 36-bit B accumulator contains the value $F:0001:00AA. Execution of the 
ROR B instruction shifts the 16-bit value in the B1 register | bit to the right, shifting bit 16 into C, 
rotating C into bit 31, and storing the result back in the B1 register. 
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ROR 


Condition Codes Affected: 


Rotate Right 


ROR 


ie MR ri¢ CCR > 
15 14 13 12 11 10 9 8/7 6 5 4 2 1 0 
LF) *}* | * | * 7) * 7 it 7 lO ;SZ}L}E;UyNIZ}ViEC 
N — Set if bit 31 of A or B result is set 
Z — Set if bits 31-16 of A orB result are zero 
V — Always cleared 
C — Set if bit 16 of A or B was set prior to the execution of the instruction 
Instruction Fields: 
Operation Operands Cc W Comments 
ROR FDD 2 1 Rotate 16-bit register right by 1 bit through the carry 
bit 
Timing: 2 oscillator clock cycles 
Memory: 1 program word 
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RTI Return from Interrupt RTI 


Operation: Assembler Syntax: 


X:(SP) > SR; SP-1—> SP RTI 
XSP) > PC; SP-1—> SP 


Description: Pull the SR and the PC from the software stack. The previous PC is lost. 


Example: 
RTI ; pull the SR and PC registers from the stack 
Before Execution After Execution 
X:$0100 $1300 X:$0100 $1300 
X:$00FF $754C X:$00FF $754C 
SR $0309 SR 1300 
SP $0100 SP $00FE 


Explanation of Example: 
The RTI instruction pulls the 16-bit PC and the 16-bit SR from the stack and updates the system SP. 


Program execution continues at $754C. 


Restrictions: 
Due to pipelining in the program controller and the fact that the RTI instruction accesses certain pro- 


gram controller registers, the RTI instruction must not be immediately preceded by any of the follow- 
ing instructions: 


MOVE(C) to the SP 
Any bit-field instruction performed on the SR 


An RTI instruction cannot be the last instruction in a DO loop (at the LA). 
An RTI instruction cannot be repeated using the REP instruction. 
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RTI 


Condition Codes Affected: 


i 


Return from Interrupt 


MR rid CCR 


15 


14 13 


12 11 10 9 8|7 6 5 4 38 


LF 


* 


* 


* 


“| * | 11}10;SZ;}L|E;]U/]N 


Instruction Fields: 


LF 
Il 
10 
SZ 
L 
E 
U 
N 
Z 
Vv 
C 


Set according to the value pulled from the stack 
Set according to the value pulled from the stack 
Set according to the value pulled from the stack 
Set according to the value pulled from the stack 
Set according to the value pulled from the stack 
Set according to the value pulled from the stack 
Set according to the value pulled from the stack 
Set according to the value pulled from the stack 
Set according to the value pulled from the stack 
Set according to the value pulled from the stack 
Set according to the value pulled from the stack 


RTI 


Operation 


Operands Cc W 


Comments 


RTI 


from the stack 


10 1 Return from interrupt, restoring 16-bit PC and SR 


Timing: 10 + rx oscillator clock cycles 


Memory: 1 program word 
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RTS 


Return from Subroutine RTS 


Operation: Assembler Syntax: 

X:(SP) > SR (bits 10-14); SP-1— SP RTS 

X:(SP) > PC; SP-1— SP 

Description: Return from a call to a subroutine. To perform the return, RTS pulls and discards the previously pushed 


Example: 


SR (except bits 10-14); the remaining SR bits are unaffected. It then pops the PC from the software 
stack. The previous PC is lost. At the end of the execution, SP points to the previously used location 
before entering the subroutine. 


RTS ; pull SR (bits 10-14) and PC from the stack 
Before Execution After Execution 
X:$0100 $8000 X:$0100 $8000 
X:$00FF $754C X:$00FF $754C 
SR $8009 SR 8009 
SP $0100 SP $00FE 


Explanation of Example: 


Restrictions: 


The example makes the assumption that during entry of the subroutine, only the LF bit (SR bit 15) is 
on. During execution of the subroutine, the C and N bits were set. To perform the return, RTS restores 
bits 10-14 of the SR and pops the 16-bit PC from the software stack, and updates the SP. Program ex- 
ecution continues at $754C. 


Due to pipelining in the program controller and the fact that the RTS instruction accesses certain pro- 
gram controller registers, the RTS instruction must not be immediately preceded by any of the follow- 
ing instructions: 


MOVE(C) to the SP 


An RTS instruction cannot be the last instruction in a DO loop (at the LA). 
An RTS instruction cannot be repeated using the REP instruction. 


Manipulation of bits 10-14 in the stack location corresponding to the SR register may generate unwant- 
ed behavior. These bits will read as zero during DSP read operations and should be written as zero to 
ensure future compatibility. 


Condition Codes Affected: 


The condition codes are not affected by this instruction. 


Instruction Fields: 


Operation Operands Cc W Comments 
RTS 10 1 Return from subroutine, restoring 16-bit PC from the 
stack 
Timing: 10 + rx oscillator clock cycles 
Memory: 1 program word 
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SBC SBC 


Operation: 
D-S-C3>D 


Subtract Long with Carry 


Assembler Syntax: 
(no parallel move) SBC S,D (no parallel move) 


Description: Subtract the source operand (S) and C of the CCR from the destination operand (D) and store the result 
in the destination accumulator. Long words (32 bits) are subtracted from the 36-bit destination accu- 


mulator. 

Usage: This instruction is typically used in multi-precision subtraction operations (see Section 3.3.8.1, 
“Multi-Precision Addition and Subtraction,” on page 3-23) when it is necessary to subtract two num- 
bers that are larger than 32 bits, such as 64-bit or 96-bit subtraction. 

Example: 

SBC Y,A 
Before Execution After Execution 
0 4000 0000 0 0000 0001 
A2 Al AO A2 Al AO 
Y 3FFF FFFE Y 3FFF FFFE 
Y1 YO Y1 YO 
SR 0301 SR 0310 


Explanation of Example: 
Prior to execution, the 32-bit Y register (comprised of the Y1 and YO registers) contains the value 
$3FFF:FFFE, and the 36-bit accumulator contains the value $0:4000:0000. In addition, C is set to one. 
The SBC instruction automatically sign extends the 32-bit Y registers to 36-bits and subtracts this val- 
ue from the 36-bit accumulator. In addition, C is subtracted from the LSB of this 36-bit addition. The 
36-bit result is stored back in the A accumulator, and the conditions codes are set correctly. The Y1:Y0 
register pair is not affected by this instruction. 


Note: Cis set correctly for multi-precision arithmetic using long-word operands only when the extension reg- 
ister of the destination accumulator (A2 or B2) contains sign extension of bit 31 of the destination ac- 


cumulator (A or B). 
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SBC 


Condition Codes Affected: 


Subtract Long with Carry 


SBC 


ie MR pie CCR > 
15 14 13 12 11 10 9 8|7 6 5 4 3 2 1 #O 
LF) * |; * 7} * ])]* /* |W )10;SZ;L}/E;UIN{|Z] VIC 
L — Set if overflow has occurred in result 
E — Set if the signed integer portion of A or B result is in use 
U — Set according to the standard definition of the U bit 
N — Set if bit 35 of A or B result is set except during saturation 
Z — Setif A or B result equals zero; cleared otherwise 
V — Set if overflow has occurred in A or B result 
C — Set ifa carry (or borrow) occurs from bit 35 of A or B result 
Instruction Fields: 
Operation Operands Cc W Comments 
SBC Y,F 2 1 Subtract with carry (set C bit also) 
Timing: 2 oscillator clock cycles 
Memory: 1 program word 
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STOP Stop Instruction Processing STOP 


Operation: Assembler Syntax: 
Enter the stop processing state STOP 


Description: Enter the stop processing state. All activity in the processor is suspended until the RESET pin is as- 
serted, the IRQA pin is asserted, or an on-chip peripheral asserts a signal to exit the stop processing 
state. The stop processing state is a very low-power standby mode where all clocks to the DSP core, 
as well as the clocks to many of the on-chip peripherals such as serial ports, are gated off. It is still 
possible for timers to continue to run in stop state. In these cases the timers can be individually powered 
down at the peripheral itself for lower power consumption. The clock oscillator can also be disabled 
for lowest power consumption. 


When the exit from the stop state is caused by a low level on the RESET pin, then the processor enters 
the reset processing state. The time to recover from the stop state using RESET will depend on a clock 
stabilization delay controlled by the stop delay (SD) bit in the OMR. 


When the exit from the stop state is caused by a low level on the IRQA pin, then the processor will 
service the highest priority pending interrupt and will not service the IRQA interrupt unless it is highest 
priority. The interrupt will be serviced after an internal delay counter counts 524,284 clock phases (that 
is, (ot-4ir) or 28 clock phases (that is, [2° -4]T) of delay if the SD bit is set to one. During this clock 
stabilization count delay, all peripherals and external interrupts are cleared and re-enabled/arbitrated 
at the start of the 17T period following the count interval. The processor will resume program execu- 
tion at the instruction following the STOP instruction (the one that caused the entry into the stop state) 
after the interrupts have been serviced or, if no interrupt was pending, immediately after the delay 
count plus 17T. If the IRQA pin is asserted when the STOP instruction is executed, the internal delay 
counter will be started. Refer to Section 7.5, “Stop Processing State,” on page 7-19 for details on the 
stop mode. 


Restrictions: 
A STOP instruction cannot be repeated using the REP instruction. 
A STOP instruction cannot be the last instruction in a DO loop (that is, at the LA). 


Example: 
STOP 7 enter low-power standby mode 


Explanation of Example: 
The STOP instruction suspends all processor activity until the processor is reset or interrupted as pre- 
viously described. The STOP instruction puts the processor in a low-power standby mode. No new in- 
structions are fetched until the processor exits the STOP processing state. 


Condition Codes Affected: 
The condition codes are not affected by this instruction. 


Instruction Fields: 


Operation Operands Cc W Comments 
STOP N/A 1 Enter STOP low-power mode 
Timing: The STOP instruction disables internal distribution of the clock. The time to exit the stop state depends 


on the value of the SD bit. 


Memory: 1 program word 
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S U B Subtract S U B 


Operation: Assembler Syntax: 
D-S-> D (parallel move) SUB S,D (parallel move) 
D-S-> D (two parallel reads) SUB S,D (two parallel reads) 


Description: Subtract the source operand (S) from the destination operand (D), and store the result in the destination 
operand. Words (16 bits), long words (32 bits), and accumulators (36 bits) may be subtracted from the 


destination. 
Usage: This instruction can be used for both integer and fractional two’s-complement data. 
Example: 
SUB X0,A X: (R2)+N,X0; 16-bit subtract, load X0, update R2 
Before Execution After Execution 
0 0058 1234 0 0055 1234 
A2 Al AO A2 Al AO 
X0 0003 X0 3456 


Explanation of Example: 
Prior to execution, the 16-bit XO register contains the value $0003 and the 36-bit A accumulator con- 
tains the value $0:0058:1234. The SUB instruction automatically appends the 16-bit value in the XO 
register with 16 LS zeros, sign extends the resulting 32-bit long word to 36 bits, and subtracts the result 
from the 36-bit A accumulator. Thus, 16-bit operands are always subtracted from the MSP of A or B 
(AI or B1) with the results correctly extending into the extension register (A2 or B2). 


Operands of 16 bits can be subtracted from the LSP of A or B (AO or BO). This can be achieved using 
the Y register. When loading the 16-bit operand into YO and loading Y1 with the sign extension of YO, 
a 32-bit word is formed. Executing a SUB _Y,Aor SUB Y,B instruction generates the desired opera- 
tion. Similarly, the second accumulator can also be used for the source operand. 


Note: Bit C is set correctly using word or long word source operands if the extension register of the destina- 
tion accumulator (A2 or B2) contains sign extension from bit 31 of the destination accumulator (A or 
B). C is always set correctly using accumulator source operands. 
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S U B Subtract S U B 


Condition Codes Affected: 


¢ MR ri¢ CCR >| 
15 14 13 12 11 10 9 8/7 6 5 4 3 2 1 = 0 


LF; *) *] * 7) * ) * | 1} 10 }SZ;) LS E;USN] Zi Vic 


— Set according to the standard definition of the SZ bit (parallel move) 
— Set if limiting (parallel move) or overflow has occurred in result 

— Set if the signed integer portion of A or B result is in use 

Set according to the standard definition of the U bit 

— Set if bit 35 of A or B result is set except during saturation 

— Setif A or B result equals zero 

— Set if overflow has occurred in A or B result 

— Set if a carry (or borrow) occurs from bit 35 of A or B result 


O<NZCmrY 
| 


See Section 3.6.5, “16-Bit Destinations,” on page 3-35 for cases with X0, YO, or Y1 as D. 
See Section 3.6.2, “36-Bit Destinations—CC Bit Set,” on page 3-34 and Section 3.6.4, “20-Bit Desti- 
nations—CC Bit Set,” on page 3-34 for the case when the CC bit is set. 


Instruction Fields: 


Operation Operands Cc W Comments 
SUB DD,FDD 2 1 36-bit subtract of two registers. 16-bit source regis- 
ters are first sign extended internally and concate- 
F1,DD nated with 16 zero bits to form a 36-bit operand. 
~F,F 
Y,F 
X:(SP-xx), FDD 6 1 Subtract memory word from register. 
X:aa,FDD 4 1 X:aa represents a 6-bit absolute address. Refer to 
Absolute Short Address (Direct Addressing): 
X:xxxXx,FDD 6 2 | <aa> on page 4-22. 
#xx,FDD 4 1 Subtract an immediate value 0-31 
#xxxx,FDD 6 2 Subtract a signed 16-bit immediate 
Data ALU Operation Parallel Memory Read or Write 
Operation Registers Memory Access Source or Destination 

SUB X0,F X:(Rn)+ X0 
Y1,F X:(Rn)+N Y1 

YO,F YO 

A 

A,B B 

B,A Al 

B1 

(F = Aor B) 
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SUB 


S U B Subtract 
Data ALU First and Second Memory Destinations for Memory 
Operation Reads 
Operation Registers Read1 Read2 Destination1 Destination2 
SUB X0,A X:(RO)+ X:(R3)+ YO x0 
Y1,A X:(RO)+N X:(R3)- 
YO,A Y1 X0 
X:(R1)+ : : 
. Valid Valid 
208 PACED destinations destinations 
Y1,B 
YOB for Read for Read2 


Timing: 


Memory: 


A-164 


2 + mv oscillator clock cycles for SUB instructions with a parallel move 


Refer to previous tables for SUB instructions without a parallel move 


1 program word for SUB instructions with a parallel move 


Refer to previous tables for SUB instructions without a parallel move 
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SWI Software Interrupt SWI 


Operation: Assembler Syntax: 
Begin SWI exception processing SWI 


Description: Suspend normal instruction execution and begin SWI exception processing. The interrupt priority lev- 
el, specified by the I1 and IO bits in the SR, is set to the highest interrupt priority level upon entering 
the interrupt service routine. 

Example: 


SWI ; begin SWI exception processing 


Explanation of Example: 
The SWI instruction suspends normal instruction execution and initiates SWI exception processing. 


Restrictions: 
A SWI instruction cannot be repeated using the REP instruction. 


Condition Codes Affected: 
The condition codes are not affected by this instruction. 


Instruction Fields: 


Operation Operands Cc W Comments 


SWI 8 1 Execute the trap exception at the highest interrupt 
priority level, level 1 (non-maskable) 


Timing: 8 oscillator clock cycles 


Memory: 1 program word 
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Tec Transfer Conditionally Tec 


Operation: Assembler Syntax: 
If cc, then S > D Tec S,D 
If cc, then S > D and RO-> RI Tec S,D RO,R1 


Description: Transfer data from the specified source register (S) to the specified destination accumulator (D) if the 
specified condition is true. If a second source register RO and a second destination register R1 are also 
specified, transfer data from address register RO to address register R1 if the specified condition is true. 
If the specified condition is false, a NOP is executed. 


Usage: When used after the CMP instruction, the Tcc instruction can perform many useful functions such as 
a “maximum value” or “minimum value” function. The desired value is stored in the destination accu- 
mulator. If address register RO is used as an address pointer into an array of data, the address of the 
desired value is stored in the address register R1. The Tcc instruction may be used after any instruction 
and allows efficient searching and sorting algorithms. 


The term “cc” specifies the following: 


“cc” Mnemonic Condition 
CC (HS*)— carry clear (higher or same) C=0 
CS (LO*)— carry set (lower) C=1 
EQ — equal Z=1 
GE — greater than or equal N @ V=0 
GT — greater than Z+(N @ V)=0 
LE —less than or equal Z+(N ® V)=1 
LT —less than N ® V=1 
NE — not equal Z=0 
* Only available when CC bit set in the OMR 
+ denotes the logical OR operator, 
® denotes the logical exclusive OR operator 


Note: This instruction is considered to be a move-type instruction. Due to pipelining, if an address register 
(RO or R1 for the Tcc instruction) is changed using a move-type instruction, the new contents of the 
destination address register will not be available for use during the following instruction (that is, there 
is a single-instruction-cycle pipeline delay). 


A-166 DSP56800 Family Manual @ vororo.a 


Tec Transfer Conditionally Tec 


Example: 
CMP X0,A ; compare XO and A (sort for minimum) 
TGT X0,A RO,R1; transfer X0 A and RO > R1 if XO <A 


Explanation of Example: 
In this example, the contents of the 16-bit XO register are transferred to the 36-bit A accumulator, and 
the contents of the 16-bit RO address register are transferred to the 16-bit R1 address register if the 
specified condition is true. If the specified condition is not true, a NOP is executed. 


Condition Codes Affected: 
The condition codes are tested but not modified by this instruction. 


Instruction Fields: 


Data ALU Transfer AGU Transfer 
Operation Cc | W Comments 
Source | Destination || Source | Destination 


Tcc DD F (No transfer) 2 1 Conditionally transfer one 
register 
A B (No transfer) 2 1 
B A (No transfer) 2 1 
DD F RO R1 2 1 Conditionally transfer one 


data ALU register and one 
AGU register 


A B RO R1 2 1 


B A RO R1 2 1 


Note: The Tcc instruction does not allow the following condition codes: HI, LS, NN, and NR. 


Timing: 2 oscillator clock cycles 


Memory: 1 program word 
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TFR Transfer Data ALU Register TFR 


Operation: Assembler Syntax: 
S-~D (parallel move) TFR S,D (parallel move) 


Description: Transfer data from the specified source data ALU register (S) to the specified destination data ALU 
accumulator (D). The TFR instruction can be used to move the full 36-bit contents of one accumulator 
to the other. This transfer occurs with saturation when the saturation bit, SA, is set. The TFR instruction 
only affects bits L and SZ bits in the CCR (which can be set by data movement associated with the 
instruction’s parallel operations). 


Usage: This instruction is very similar to a MOVE instruction but has two uses. First, it can be used to perform 
a 36-bit transfer of one accumulator to another. Second, when used with a parallel move, this instruc- 
tion allows a register move and a memory move to occur simultaneously in | instruction that executes 
in | instruction cycle. 


Example: 
TFR B,A X:(RO)+,Y1 ; move B to A and update Y1, RO 
Before Execution After Execution 
3 0123 0123 A CCCC EEEE 
A2 Al AO A2 Al AO 
A CCCC EEEE A CCCC EEEE 
B2 Bi BO B2 B1 BO 


Explanation of Example: 
Prior to execution, the 36-bit A accumulator contains the value $3:0123:0123 and the 36-bit B accu- 
mulator contains the value $A4:CCCC:EEEE. Execution of the TFR B, A instruction moves the 36-bit 
value in B into the 36-bit A accumulator. 


Condition Codes Affected: 


Let MR Dig CCR >| 
15 14 13 12 11 10 9 8/7 6 5 4 3 2 1 +0 


LF; *) * ] * 7) * ) * | 1} 10 }SZ;) LSE; USN] Zl VIC 


SZ — Set according to the standard definition of the SZ bit (parallel move) 
L — Set if data limiting has occurred during parallel move 
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TFR Transfer Data ALU Register TFR 


Instruction Fields: 


Operation Operands Cc W Comments 
TFR DD,F 2 1 Transfer register to register 
A,B Transfer one accumulator to another (36-bits) 
B,A 
Data ALU Operation Parallel Memory Read or Write 
Operation Registers Memory Access Source or Destination 
TFR X0,F X:(Rn)+ XO 
Y1,F X:(Rn)+N Y1 
YO,F YO 
Al 
A,B B1 
B,A A 
B 
F=A,B 
Timing: 2 + mv oscillator clock cycles 
Memory: 1 program word 
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TST Test Accumulator TST 


Operation: Assembler Syntax: 
S-0 (parallel move) TST S (parallel move) 


Description: Compare the specified source accumulator (S) with zero, and set the condition codes accordingly. No 
result is stored, although the condition codes are updated. 


Example: 
TST A X:(RO)+N,B ; set condition codes for the value 
; in A, update B and RO 
Before Execution After Execution 
8 0203 0000 8 0203 0000 
A2 Al AO A2 Al AO 
SR 0300 SR 0338 


Explanation of Example: 
Prior to execution, the 36-bit A accumulator contains the value $8:0203:0000, and the 16-bit SR con- 
tains the value $0300. Execution of the TST A instruction compares the value in the A register with 
zero and updates the CCR accordingly. The contents of the A accumulator are not affected. 


Condition Codes Affected: 


be MR Did CCR >| 
15 14 13 12 11 10 9 8/7 6 5 4 3 2 1 0 


LF; *] *] * 7) * ) * | 1710 }SZ;) LL} E;USN] Zi Vic 


— Set according to the standard definition of the SZ bit (parallel move) 
— Set if data limiting has occurred during parallel move 

— Set if the signed integer portion of A or B result is in use 

Set according to the standard definition of the U bit 

— Set if bit 35 of A or B result is set except during saturation 

— Setif A or B result equals zero 

— Always cleared 

— Always cleared 


O<NZGmrY 
| 


See Section 3.6.2, “36-Bit Destinations—CC Bit Set,” on page 3-34 and Section 3.6.4, “20-Bit Desti- 
nations—CC Bit Set,” on page 3-34 for the case when the CC bit is set. 
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TST 


TST Test Accumulator 
Instruction Fields: 
Operation Operands Cc W Comments 
TST F 2 1 Test 36-bit accumulator 
Data ALU Operation Parallel Memory Read or Write 
Operation Registers Memory Access Source or Destination 
TST A X:(Rn)+ X0 
B X:(Rn)+N Y1 
YO 
Al 
B1 
A 
B 
Timing: 2 + mv oscillator clock cycles 


Memory: 


1 program word 
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TSTW Test Register or Memory TSTW 


Operation: Assembler Syntax: 
S-0 (no parallel move) TSTW S (no parallel move) 


Description: Compare 16 bits of the specified source register or memory location with zero, and set the condition 
codes accordingly. No result is stored, although the condition codes are updated. 


Example: 
TSTW X:$0007 ; set condition codes using X:$0007 
Before Execution After Execution 
X:$0007 FCOO X:$0007 FCOO 
SR 0300 SR 0308 


Explanation of Example: 
Prior to execution, location X:$0007contains the value $FCOO and the 16-bit SR contains the value 
$0300. Execution of the instruction compares the value in memory location X:$0007 with zero and up- 
dates the CCR accordingly. The value of location X:$0007 is not affected. 


Note: This instruction does not set the same set of condition codes that the TST instruction does. Both in- 
structions correctly set the V, N, Z, and C bits, but TST sets the E bit and TSTW does not. This is a 
16-bit test operation when done on an accumulator (A or B), where limiting is performed if appropriate 
when reading the accumulator. 


Condition Codes Affected: 


i MR pid CCR > 
15 14 13 12 11 10 9 8/|}7 6 5 4 3 2 1 O 


LF; *} * ] * 7] * ) * 110 ;SZ; LP EY}; US NY ZI] Vc 


— Set if bit 15 (bit 31 of A or B) of result is set 
— Set if result equals zero 

— Always cleared 

— Always cleared 


QA<NZ 
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TSTW Test Register or Memory TSTW 


Instruction Fields: 


Operation Operands Cc W Comments 
TSTW DDDDD 2 1 Test 16-bit word in register. All registers allowed 
(except HWS) except HWS. Limiting is not performed if an accu- 
mulator is specified. 
X:(Rn) 2 1 Test a word in memory using appropriate address- 
ing mode. 
X:(Rn)+ 2 1 
X:(Rn)- 2 1 
X:(Rn+N) 4 1 
X:(Rn)+N 2 1 
X:(RN+Xxxx) 6 2 
X:(R2+Xxx) 4 1 
X:(SP-xx) 4 1 


X:aa represents a 6-bit absolute address. Refer to 
X:aa 2 1 Absolute Short Address (Direct Addressing): 
<aa> on page 4-22. 


X:pp 2 1 
X:pp represents a 6-bit absolute I/O address. Refer 
XIXXXX 4 2 to I/O Short Address (Direct Addressing): <pp> 
on page 4-23. 
(Rn)- 2 1 | Test and decrement AGU register 
Timing: Refer to the preceding Instruction Fields table 
Memory: Refer to the preceding Instruction Fields table 


© MOTOROLA Instruction Set Details A-173 


WAIT Wait for Interrupt WAIT 


Operation: Assembler Syntax: 


Disable clocks to the processor core WAIT 
and enter the wait processing state. 


Description: Enter the wait processing state. The internal clocks to the processor core and memories are gated off, 
and all activity in the processor is suspended until an unmasked interrupt occurs. The clock oscillator 
and the internal I/O peripheral clocks remain active. 


When an unmasked interrupt or external (hardware) processor reset occurs, the processor leaves the 
wait state and begins exception processing of the unmasked interrupt or reset condition. 


Restrictions: 
A WAIT instruction cannot be the last instruction in a DO loop (at the LA). 
A WAIT instruction cannot be repeated using the REP instruction. 
Example: 
WAIT ; enter low-power mode, wait for interrupt 
Explanation of Example: 
The WAIT instruction suspends normal instruction execution and waits for an unmasked interrupt or 
external reset to occur. No new instructions are fetched until the processor exits the wait processing 
state. 


Condition Codes Affected: 
The condition codes are not affected by this instruction. 


Instruction Fields: 


Operation Operands Cc W Comments 
WAIT n/a 1 Enter WAIT low-power mode 
Timing: If an internal interrupt is pending during the execution of the WAIT instruction, the WAIT instruction 


takes a minimum of 32T cycles to execute. 

If no internal interrupt is pending when the WAIT instruction is executed, the period that the DSP is 
in the wait state equals the sum of the period before the interrupt or reset causing the DSP to exit the 
wait state and a minimum of 28T cycles to a maximum of 31T cycles (see the appropriate data sheet). 


Memory: 1 program word 
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Appendix B 
DSP Benchmarks 


The following benchmarks illustrate source code syntax and programming techniques for the DSP56800. 


The assembly language source is organized into five columns, as shown in Example B-1. 


Example B-1. Source Code Layout 


Label! | Opcode® Operands? Data bus4 


FIR MAC YO,X0,A X: (RO) +,Y X: (R3)+,X0 


1.Used for program entry points and end-of-loop indication. 


Comment® 


7Do each tap 


2.Indicates the data ALU, address ALU, or program-controller operation to be performed. This column must 


also be included in the source code. 
3.Specifies the operands to be used by the opcode. 


4.Specifies an optional data transfer over the data bus and the addressing mode to be used. 


5.Used for documentation purposes and does not affect the assembled code. 


In each code example, the number of program words and that each instruction occupies, and the execution 


time for each, is listed in the comments and summed at the end. 


Table B-1 shows the number of program words and instruction cycles for each benchmark. 


Table B-1. Benchmark Summary 


Execution Time nrodian 
Benchmark (# Icyc) Length 

(# Words) 
Real Correlation or Convolution (FIR Filter) 1N 9 
N Complex Multiplies 6N 15 
Complex Correlation or Convolution (Complex FIR) 5N 15 
Nth Order Power Series (Real, Fractional Data) 1N 13 
N Cascaded Real Biquad IIR Filters (Direct Form Il) 6N 16 
N Radix 2 FFT Butterflies 13N 17 
LMS Adaptive Filter: Single Precision 3N 18 
LMS Adaptive Filter: Double Precision 6N 21 
LMS Adaptive Filter: Double Precision Delayed 5N 27 
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Table B-1._ Benchmark Summary (Continued) 


Execution Time regia 
Benchmark (# Icyc) Length 

(# Words) 
Vector Multiply-Accumulate 2N 12 
Energy in a Signal 1N 7 
[3x3][1x3] Matrix Multiply 20 20 
[NxXN][NxN] Matrix Multiply N3 + 8N2 30 
N Point 3x3 2-D FIR Convolution 13N2411N 41 
Sine Wave Generation: Double Integration Technique 2N 13 
Sine Wave Generation: Second Order Oscillator 5N 16 
Array Search: Index of the Highest Signed Value 4N 10 
Array Search: Index of the Highest Positive Value 2N 10 
Proportional Integrator Differentiator (PID) Algorithm 6N 6 
Autocorrelation Algorithm (p + 1)? (N-p/2) 23 


B.1 Benchmark Code 


The following source code lists all the “defines” for the benchmarks. 


page 132 
opt cc 


; define section 


AD EQU 0 

BD EQU $100 
bd EQU. $100 
Cc EQU $200 
é EQU $200 
D EQU $300 
N EQU 100 
AR EQU $300 
AI EQU $400 
OUTPUT EQU $500 
output EQU SFFF1 
INPUT EQU $501 
input EQU SFFF1 
W EQU 0 
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mask 
image 
dividend 
divisor 
paddr 
gqaddr 

wl 

w2 

s 
tablebase 
lpc 
frame 
cor 
shift 
table 


org 


0 


$100 


$80 


; shift constant 


$180 ; base address of a-law table 
p:$40 


B.1.1 Real Correlation or Convolution (FIR Filter) 


; ¢c(n) = SUM(I=0,.. 


6 6 
a4 
Gl Gl 


8 
<s 
ea 


#N 


YO,X0,A 


A 


@ MOTOROLA 


-/N-1) { a(I) * b(n-I) } 


X: (RO) +, YO ; 
X: (R3)+,X0 7 


X:(RO)+,YO X:(R3)+,X0 ; 


PRP FPP FP db 
PRP WFP FP NY DN 


Total: 9 1N+11 


DSP Benchmarks B-3 


B.1.2 N Complex Multiplies 


; cr(I) + jci(I) = ( ar(I) + jai(I) ) * ( br(I) + jbi(T) 

; ecr(I) = ar(I) * br(I) - ai(I) * bi(TI) Yl=ar 

; ci(I) = ar(I) * bi(I) + ai(I) * br(I) YO=ai XO 
opt ce 
MOVE #AD, RO ; 
MOVE #C-1,R2 ; 
MOVE #BD,R3 7 
MOVE X:(R2),B ; 
DO #NUM, END_DO8 ; 
MOVE X: (RO)+,Y1 X:(R3)+,X0 ; 
MPY Y1,X0,A B, X: (R2) + ; 
MOVE X: (RO) +, YO ; 
MPY Y0,X0,B X: (R3) +, X0 ; 
MACR —-yYO0,X0,A ; 
MACR Y1,X0,B A,X: (R2)+ ; 

END_DO8 
MOVE B,X: (R2)+ 7 

; Total: 15 


T=1, ,N 
bi 

2 

2 

2 
dummy move! 

3 

1 get ar,br 

1 ar*br, 
store imag 

1 get ai 

1 ai*br, get bi 
get bi 

1 ar*br-ai*bi 

1 ar*bitai*br, 
store real 

1 

6N+11 


B.1.3. Complex Correlation Or Convolution (Complex FIR) 


; cr(n) + jci(n) = SUM(I=0,...,N-1) 

; { ( ar(I) + jai(I) ) * ( br(n-I) + jbi(n-I) ) } 

; cr(n) = SUM(I=0,...,N-1) 

; { ar(I) * br(n-I) - ai(I) * bi(n-I) } 

; ci(n) = SUM(I=0,...,N-1) 

; { ar(I) * bi(n-I) + ai(I) * br(n-I) } 
opt cc 
MOVE #AD,RO ; 2 
MOVE #BD,R3 ; 2 
CLR A : (RO) +, YO 51 
CLR B : (R3)+,Y1 aoe 
DO #N, END_DOB m2 
MAC YO,Y1,A X: (R3)+,X0 pod 
MAC YO,X0,B X: (RO) +, YO jae 
MAC YO0,Y1,B X:(R3)+,Y1 i 1 
MAC —-YO,X0,A 7 i 
MOVE X: (RO) +, YO ; 1 
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YO=ar Y1=br 


YO=ai X0=bi 


PoP PP FP WFP BP Dd DN 


ar 


ar*br ,ai,bi 
ar*bi 
ar*bitai*br,ar 


ar*br-ai*bi 


@ MOTOROLA 


END_DOB 


RND A ae oo 1 
RND B aa 1 
7 oe 
; Total: 15 5N+11 


7; Cc = SUM(I=0,...,N) { a(I) * b**I } 
; = [[la(n) *bta(n-1)] *b+ta(n-2) ]*bta(n-3)]..... 
opt ole! 
MOVE #BD,R1 -; 2 2 
MOVE #AD,RO eae 22 
MOVE X: (R1),YO asl i b 
MOVE YO,Y1 ; 1 1 b 
MOVE : (RO) +,A eo L 1 get a(n) 
MOVE : (RO) +,B ee ah 1 get a(n-1) 
DO #NUM/2,END_DOC ; 2 3 
MAC Al,Y0,B X:(RO)+,A 7; 1 #1 get a(n-2), and 
so on 
MAC B1,Y1,A X: (RO)+,B eo 1 get a(n-3), and 
so on 
END_DOC 
RND A ; 1 #1 
i Total: 13 1N+12 


B.1.5 N Cascaded Real Biquad IIR Filters (Direct Form Il) 


Many digital-filter design packages generate coefficients for direct form II IIR filters. Often, these 
coefficients are greater in magnitude than 1.0. This implementation is suitable for IIR filters with 
coefficients greater in magnitude than 1.0 because it allows the user to simply divide all coefficients 
generated by 2. 


; w(n)/2 = x(n)/2 -— (al/2) * w(n-1) - (a2/2) * w(n-2) 

7 y(n) /2 = w(n)/2 + (b1/2) * w(n-1) + (b2/2) * w(n-2) 

; D High Memory Order —- w(n-2)1,w(n-1)1,w(n-2)2,w(n-1)2,... 

; D Low Memory Order - (a2/2)1, (al/2)1, (b2/2)1, (b1/2)1, (a2/2)2,... 


; This version uses two pointers. 


opt cc 

MOVE #W,RO pi 2 
MOVE #C,R3 - 2 2 
MOVE #-1,N apa al 
MOVE x:input,A Howie 1 
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ASR A X:(R3)+,X0 ; 
MOVE X: (RO) +, YO ; 
DO #N, END_DOE ; 
MAC YO,X0,A X:(RO)+N,Y1 X:(R3)+,X0 ; 
MAC Y1,X0,A Y1,X: (RO) + ; 
ASL A X:(R3)+,X0 ; 
ASR A A,X: (RO) + ; 
MAC YO,X0,A X:(R3)+,X0 ; 
MAC Y1,X0,A X: (RO) +, YO X:(R3)+,X0 ; 

END_DOE 

i Total: 16 


B.1.6 N Radix 2 FFT Butterflies 


This is a decimation in time (DIT), in-place algorithm. Figure B-1 gives a graphic overview and memory 


map. 


; Twiddle Factor Wk= wr 


Figure B-1. 


7 — saved on each pass 


7; xr = ar + wr * br - wi 
; xi = ai + wi * br + wr 


; yr = ar - wr * br + wi 


; yi = ai - wi * br - 


opt cc 


PoP PP PP &® PB 


1 X0=a2/2 
1 YO=wn-2 
3 

1 yl=wn-1 
1 

L X0= b2/2 
1 X0=b1/2 
1 

L 

6N+11 


k 10,12 X memory 
X=A+BW ee ar/xr 
ai/xi 
r3,r1 
—_> br/yr 
bi/yi 


——-y» | cos(2zk/N) 


-sin(2nk/N) 


X0 Yo v1 
A B 
Y=A-BW — 


+ jwi = cos(2mk/N) +3 sin (27k/N) 


ene 
* bi 
* bi = 2 * ar - xr 


wr * bi = 2 * ai - xi 
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N Radix 2 FFT Butterflies Memory Map 


pointed by R1 
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move 


move 


move 


x: (r1)+,y0 
x: (r0),b 


x: (r1)tn,yl 


x: (r3)+,x0 


; save rl, update rl to point last bi/yi 


move 
do 
push 
mac 
macr 
move 
move 
asl 
sub 
move 
mac 
pop 
macr 
asl 
sub 

end_bfly 
move 
move 


; save ri, 


#n,end_bfly 
x0 

y0,x0,b 
-yl,x0,b 


#0,n 


x: (r3) +, x0 


a,x: (r1)+ 
x:(r0)t+,a 
b,x: (r2) + 
x: (r0)+n,b 
a,x: (r1)+ 


x: (r0)t+,a 


x: (r3) +, x0 
b,x: (r2) + 


x: (r0)+n,b 


#XX,N 


b, xX: (r1)+n 


~ ~ ~ ~ ~e 


~e 


~ ~e ~ ~e ~e ~ 


~ 
PoP PP PP PP BP BP Be BP BPD 


~ 


San Ss 
RoR 


; update rl to point twiddle factors 


, 


Total: 17 


B.1.7 LMS Adaptive Filter 


Figure B-2 gives a graphical representation of this implementation of the LMS adaptive filter. 


; yO=wr ; x0=br 


;b=ar 


; yl=wi 


; emulate X:(Rn) adr mode 


PoP PP PP PP BP BP BP BP BP wD 


13N+9 


push br 
b=ar+wrbr 


b=xr 


a=ar 


a=2ar-xr=yr 


b=ai 
b=aitwrbi 
pop br 
b=xi ;a=ai 
a=2ai-xi=yi 


b=ar 


save last yi 


Figure B-2. LMS Adaptive Filter Graphic Representation 
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The following three LMS adaptive filter benchmarks are provided: 


e Single precision 
¢ Double precision 


¢ Double precision delayed 


; Notation and symbols: 


; x(n) -— Input sample at time n. 

; A(n) — Desired signal at time n. 

; y(n) — FIR filter output at time n. 

; H(n) — Filter coefficient vector at time n. 


; H={c0,cl,c2,,...,ck,...,c(N-1) } 


; X(n) - Filter state variable vector at time N. 
; X={x(n),x(n-1),....,x(n-N+1) } 


; Mu — Adaptation gain. 
; N — Number of coefficient taps in the filter. 


; True LMS Algorithm Delayed LMS Algorithm 
; Get input sample Get input sample 

; Save input sample Save input sample 

; Do FIR Do FIR 

; Get d(n), find e(n) Update coefficients 

; Update coefficients Get d(n), find e(n) 

7 Output y(n) Output y(n) 

; Shift vector X Shift vector X 


; System equations: 


7; @(n)=d(n) -H (n) X(n) e (n) =d (n) -H (n) X (n) 


; H(nt+1) =H (n) +uX(n) e(n) H (n+1) =H (n) +uX (n-1) e(n-1) 


The references for this code include the following: 
Adaptive Digital Filters and Signal Analysis, Maurice G. Bellanger (Marcel Dekker: 1987) 
“The DLMS Algorithm Suitable for the Pipelined Realization of Adaptive Filters,” Proc. IEEE 


ASSP Workshop, Academia Sinica, Beijing (IEEE: 1986) 
NOTE: 


(FIR filter and error) 


(Coefficient update) 


The sections of code shown describe how to initialize all registers, filter an 
input sample, and perform the coefficient update. Only the instructions 
relating to the filtering and coefficient update are shown as part of the 
benchmark. Instructions executed only once (for initialization) or 
instructions that may be user application dependent are not included in the 


benchmark. 
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B.1.7.1 Single Precision 


Figure B-3 shows a memory map for this implementation of the single-precision LMS adaptive filter. 


.o ——> 


X memory 
x(n) 
x(n-1) 


x(n-N+1) 


3,1. —> c0 


cl 
cl 


c(N-1) 


AA0081 


Figure B-3. LMS Adaptive Filter—Single Precision Memory Map 


opt cc 

move #XM, r0 

move #N-1,m0 

move #-2,N 

movep x:input,y0 

move #H, r3 

elr a y0O,x: (r0) + 
move 

rep #N-1 

mac y0O,x0,a x: (r0)+,y0 
macr y0,x0,a 

movep 


; (Get d(n), 


; start of X 


; modulo N 


; adjustment for filtering 


; get input sample 


x:(r3)+,x0 ; 


x:(r3)+,x0 ; 


, 


PRP PPP BP b& 


a,X:output ; 


subtract fir output, multiply by “u”, 


; This section is application dependent.) 


move #H, 3 

move £3, rl 

move x: (r0)+,y0 

move 

do #ntaps,_coefupdate 

macr yl,y0O,a x: (r0)+,y0 

sp oa x0,a a,X:(r1)+ 
_coefupdate 

move x: (r0) +n, yO 
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ae 
, 1 
ped 
x:(r3)t+,a , 1 
22 
x:(r3)+,x0 ; 1 
eek 
reek 
Total: 18 


DSP Benchmarks 


2 coefficients 
1 save x(n) 
1 get c0 

3 do fir 

1 

1 last tap 
1 


output fir if desired 


put the result in yl. 


coefficients 
coefficients 
get x(n) 
a=c0 


update coef. 


PRP WFP FP FP 


copy C, 


1 update r0 


3N+18 
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B.1.7.2 Double Precision 


Figure B-4 shows a memory map for this implementation of the double-precision LMS adaptive filter. 


X memory 
10 ——> x(n) 
x(n-1) 
x(n-N+1) 
1.3) —> cOh 
col 
cih 
cil 
AA0082 
Figure B-4. LMS Adaptive Filter—Double Precision Memory Map 
opt eG 
move #XM, rO ; start of X 
move #N-1,m0 ; modulo N 
move #2, 
movep x:input, yO ; get input sample 
move #H, r3 pa i ; coefficients 
clr a yO,x: (x0) + ae 1 ; save x(n) 
move x:(r3)+n,xO0 ; 1 1 7 get c0 
rep #N-1 god 3 ; do fir 
mac x0,y0,a x: (r0)+, yO x:(r3)+n,xO ; 1 1 ; Mac; next x 
macr x0,y0,a ae al: ; last tap 
movep a,x:output ; output fir if desired 
; (Get d(n), subtract fir output, multiply by “u”, put the result in x0. 
; This section is application dependent.) 
move #H, x3 rae 2 ; coefficients 
move 3,1 ask al: ; coefficients 
move x: (r0)+, yO eo. 1 ; get x(n) 
move x:(r3)+,a en” A. 1 ; al=cOh 
move x: (r3)+,a0 ped: 1 ; ad=col 
do #ntaps,_coefupdate ee SD 3 ; update coef. 
mac x0,y0,a x: (r0)+, yO Pomel 1 ue(n) x(n)+c; fetch 
x(n) 
move a,x: (r1)+ rod ile save updated c()h 
move aO,x: (r1)+ Sane il: ; save updated c()1 
move x: (r3)t+,a pa 1 ; fetch next c()h 
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move x: (r3)+,a0 fae dl: ; fetch next c()1l 


_coefupdate 
move #-2,n Heo h 1 ; adjustment for 
; filtering 
move x: (r0) +n, yO ee ile ; update r0 
7 
; Total: 21 6N+18 


B.1.7.3. Double Precision Delayed 


Figure B-5 shows a memory map for this implementation of the double-precision delayed LMS adaptive 
filter. 


X memory 
10 ——> x(n) 
x(n-1) 
x(n-N+1) 
cOh 
r1,r3)° ——» col 
cih 
cil 
AA0083 
Figure B-5. LMS Adaptive Filter—Double Precision Delayed Memory Map 
; Delayed LMS algorithm with matched coefficient and data vectors 
; Algorithm runs in 5N (2 coeffs processed in each 10 cycle loop) 
; Data Sample is stored in YO and Yl. 
; Coefficient is stored in X0 
; Loop Gain * Error is stored in X:(R2) (will be placed in X0). 
; FIR operation done in B. 
; Coeff update operation done in A. 
; FIR sum =a =a +tc(k) *x (n-k) 
; c(k) =b=c(k) flute — *x(n-k-1) 
optew ae old old 
move #state, r0 eo2 2 
move #ntaps,m0 72 2 
move #c,xr3 ai2h 2 
move #C0-2, r1 a2 9D 
move #0,n en. ft emulate (Rn) adr 
mode 
elr b x: (x0) +,y0 ee od yO = x(n) 
move x: (r0)+,yl x:(r3)+,x0 ; 1 #1 yl= x(n-1), x0=cOh 
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B.1.8 Vector Multiply-Accumulate 


#ntaps/2,end_lms2 


y0,x0,b 


x0,a 


x0,yl,a 
x0,yl1,b 


x0,a 


x0,y0,a 


a,x: (r1)+ 
a0,x: (r1)+ 


x: (r2) +n, x0 


x: (r0)+,y0 
a,x: (r1)+ 
a0,x: (r1)+ 


x: (r2) +n, x0 


x: (r0)+,yl 


: (r3)+,a0 
: (c3)+, x0 


: (r3)+,a0 
: (r3)+,x0 


Total: 


27 


PoP BP BPP BPP BP BP BP Dd 


PoP PP PR 


PoP BP BPP PP PP BP Ww 


PoP PP 


5N+18 


a0=ck1l 
x0=c (k+1)h 


This code multiples a vector by a scalar and adds the result to another vector. The YO register holds the 
scalar value. Figure B-6 gives a graphical overview and memory map for the vector multiply-accumulate 


code. 


opt 
move 
move 


move 


cl 
Cc 
Cc 


WP 


cc 
#ad, r0 
#bod, r3 
#cd, r1 


al 


= a2 | 4 [ vo ] X | p2 


a3 


b1 


b3 


X memory 


—> al 


a2 
a3 


— b1 


r 


b2 
b3 


—> cl 


c2 
c3 
AA0084 


Figure B-6. Vector Multiply-Accumulate 
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2 2 
2 2 
2 2 


point to vec a 
point to vec b 


point to vecc 
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clr a x: (r3)+,x0 


move x: (r0)+,a 
do #NUM, _vmac 
mac y0,x0,a x: (r0)+,yl x: (r3)+,x0 
tfr yl,a a,x: (r1)+ 
_vmac 
7 
i Total: 


B.1.9 Energy in a Signal 


, 


, 


PRP MD PB 


12 


BPP WwW FP PB 


2N+11 


This code calculates the energy in a signal by summing together the square of each sample. 


opt cc 

move #ad,r0 

nop 

clr a x: (r0)t+,a 

do #NUM, energy 

mac y0,y0,a x: (r0)+,y0 
_energy 


; Total: 
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PM FP FP NS 


FP WP FP DN 


1N+7 


point to signal a 
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B.1.10 [3x3][1x3] Matrix Multiply 


Figure B-7 gives a graphical overview and memory map for a [3x3][1x3] matrix multiply. 


X memory 
oR lait 
al2 
a13 
a2 
a22 
a23 
a3 
a32 
a33 


cl aii al2 ail3 b1 
c2 |=] a21 a22 a23 |] x | b2 
c3 a31 a32 a33 b3 


Oy, | bt 
b2 
b3 


> cl 
c2 


ce AA0085 


Figure B-7. [3x3][1x3] Matrix Multiply 


opt cc 
move #AD, x3 7; 2 2 point to mat a 
move #bd, r0 2 2 point to vec b 
move #2,m0 aes al addrb mod 3 
move #C,X2 7; 2 2 point to vecc 
move x: (r0)+,y0 x: (r3)+,x0 ame 1 yO=all; x0=b1 
mpy y0,x0,a x: (r0)+,y0 x: (r3)+,x0 ae 1 all*bl1 
mac y0O,x0,a x: (r0)+,y0 x: (r3)+,x0 ee ej]. 1 +al2*b2 
macr y0O,x0,a x: (r0)+,y0 x: (r3)+,x0 pele 1 +a1l3*b3 
move a,x: (r2)+ rae af store cl 
mpy y0O,x0,a x: (x0) +,y0 x: (r3)+,x0 ae 1 a21*bl1 
mac y0O,x0,a x: (r0)+,y0 x: (r3)+,x0 Panik 1 +a22*b2 
macr y0,x0,a x: (r0)+,y0 x: (r3)+,x0 ae 1 +a23*b3 
move a,x: (r2)+ #1, 1 store c2 
mpy y0,x0,a x: (r0)+,y0 x: (r3)+,x0 etal. 1 a31*bl1 
mac y0O,x0,a x: (r0)+,y0 x: (r3)+,x0 eo: al: +a32*b2 
macr y0,x0,a el 1 +a33*b3->c3 
move a,x: (r2)+ pape 1 store c3 

: Total: 20 20 
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B.1.11 


aii 
ak1 


aNt 


move 


macr 


move 


[NxN][NxN] Matrix Multiply 


The matrix multiplications are for square NxN matrices (all elements are in row-major format). Figure B-8 
gives a graphical overview and memory map of an [NxN][NxN] matrix multiply. 


aik 
akk 


aNk .. 


cc 
#ad, r0 
r0,yl 
#bd, r3 
#C,r2 
#N,b 
b,n 

Le 

la 
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aiN 

akN X 

aNN 
c11 .. clk 
ck1 .. ckk 
cN1 .. cNk 

Figure B-8. 

x: (r0)+,y0 
x: (r0)+,y0 


b11_.. bik 
bk1 .. bkk 
bN1 .. bNk .. 
ciN | 

ckN 

cNN 


[NxN][NxN] Matrix Multiply 


x: (r3) +n, x0 


x: (r3) +n, x0 


x: (r3)+,y0 


DSP Benchmarks 


PoP eB PP PP Pd) PP DY PP BP SY SY DY PD 


biN 


bkN 


bNN 


PoP PW) PP PP wb) PP oP PP BP dS SY DY PD 


X memory 

zac em eH 

alk 

ak1 

aN1 

a Wl 1 

b1k 

2, | cit 
point to A 


AA0086 


point to current column 


point to 


B 


output mat C 


array size 


do rows 


do columns 


copy rowA 


copy col 


clr sum & pipe 


sum 


finish, next col 


B 


save output 


ecols 


erows 


B-16 


pop 
pop 
add 
move 


move 


pop 
pop 


la 

le 
yl,b 
b,yl 
#bd,r1 


la 
le 


Total: 


next row A 


DO FP FP PB 
DO FP RP RP BR 


predio 34. 
pocke SL 
Words: Cycles: 3 2 
30 ( (9+ (N-1) )N+10)N+12)= N +8N +10N+17 


DSP56800 Family Manual 


first element B 


@ MOTOROLA 


B.1.12 N Point 3x3 2-D FIR Convolution 


The two-dimensional FIR uses a 3x3 coefficient mask as shown in Figure B-9. 


c11 cl12 c13 
c21 c22 c23 
c31 c32 c33 


AA0087 
Figure B-9. 3x3 Coefficient Mask 


The image is an array of 512 pixels x 512 pixels. To provide boundary conditions for the FIR filtering, the 
image is surrounded by a set of zeros such that the image is actually stored as a 514x514 array (see 
Figure B-10). 


514 


AA0088 
Figure B-10. Image Stored as 514x514 Array 

The image (with boundary) is stored in row-major storage. The first element of the array image is 

image(1,1) followed by image(1,2). The last element of the first row is image(1,514) followed by the 


beginning of the next column image(2,1). These are stored sequentially in the array “im” in d memory. For 
example: 


¢ Image(1,1) maps to index 0. 
¢ Image(1,514) maps to index 513. 
¢  Image(2,1) maps to index 514. 
See Table B-2 for the definitions of r0, r2, and r3. 


Although many other implementations are possible, this is a realistic type of image environment where the 
actual size of the image may not be an exact power of two. Other possibilities include storing a 512x512 
image but computing only a 511x511 result, computing a 512x512 result without boundary conditions but 
throwing away the pixels on the border, and so on. 


Table B-2. Variable Descriptions 


Variable Description 
r0 image(n,m) image(n,m+1) image(n,m+2) 
image(n+514,m) image(n+514,m+1) image(n+514,m+2) 
image(n+2*514,m) image(n+2*514,m+1) image(n+2*514,m+2) 
r2 output image 
r3 FIR coefficients 
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opt eC 
move #coeffs, r3 ; 2 
move #image, r0 2 
move #512,y1 2 
move #-1029,4r1 7; 2 
move #output, r2 rae 
move x: (r0)+,y0 x:(r3)+,x0 ; 1 
move yl,n pee 
push Le Hae 
push la ped 
do yl, rows 7 2 
push ike: rid 
push la ae k 
do yl,cols 2 
mpy y0,x0,a x: (r0)+,y0 x:(r3)+,x0 ; 1 
mac y0,x0,a x:(r0)+n,yO x:(r3)+,x0 ; 1 
mac y0O,x0,a x: (r0)+,y0 x:(r3)+,x0 ; 1 
mac y0,x0,a x: (r0)+,y0 x:(r3)+,x0 ; 1 
mac y0,x0,a x:(r0)+n,yO x:(r3)+,x0 ; 1 
move rl,n pod: 
mac y0,x0,a x: (r0)+,y0 x:(r3)+,x0 ; 1 
mac y0,x0,a x: (r0)+,y0 x:(r3)+,x0 ; 1 
mac y0,x0,a x:(r0)+n,yO x:(r3)+,x0 ; 1 
move #0,r3 eal 
move yl,n po: 
macr y0,x0,a x: (r0)+,y0 x:(r3)+,x0 ; 1 
move a,X:(r2)+ ; 1 
cols 
pop la ek 
pop Te Peake 
; adjust pointers for frame boundary 
lea (r0)+ i 1 
lea (r0)+ i 1 
lea (r2)+ i 1 
lea (r2)+ il 
rows 
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PF NYO NY NY NY ND 


PoP PP PP WP PP wo PP Pe 


PoP RP PR 


PoP PP 


pt to coe 


£. 


top boundary 


output image 


yO=im(1,1 
x0=cl11 


row i to 
adjust 


im(1,1)*c 


), 


itl 


11 


+im(1,2)*cl12 
t+im(1,3) *c13 
+im(2,1) *c21 
+im(2,2) *c22 


row i to 
adjust 


1-2 


t+im(2,3) *c23 
t+im(3,1)*c31 
tim (3,2) *c32 
back to first 


coeff 


row i to 
adjust 


itl 


tim (3,3) *c33 


adjust r0 


adjust r2 
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Total: 41 


13N +11N+16 
Kernel: 13 
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B.1.13  Sine-Wave Generation 


The following two sine-wave generation benchmarks are provided: 
¢ Double integration technique 


¢ Second order oscillator 


B.1.13.1 Double Integration Technique 


Figure B-11 gives a graphical overview of the double integration technique. 


a = Stored initial value which is the 
desired tone amplitude 


y1 = 2*sin(aFs/FO0) 
FO = Oscillation Frequency 
Fs = Sampling Frequency 


AA0089 


Figure B-11. Sine Wave Generator—Double Integration Technique 


opt ce 
clr b oes dL. 
move #$4000,a 7; 2 2 
move #0,n ; 1 1 
move #$4532,y1 aA 2 
move #S1,r1 ae 1 
move y1,y0 pod. dl. 
do x0, Lloop1 22 3 
mac yl,bl,a b,x: (r1)+n ers 4. i: 
mac -y0,al,b Pe dele le 
loopl 
move b,x: (r1) Hep 1 
7 een 
; Total: 13 2N+12 
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B.1.13.2 Second Order Oscillator 


Figure B-12 gives a graphical overview of a second order oscillator. 


a = Stored initial value which 
is the desired tone amplitude 


x0 = 2*cos(2mFs/FO) 
FO = Oscillation Frequency 
Fs = Sampling Frequency 


AA0090 


Figure B-12. Sine Wave Generator—Second Order Oscillator 


opt CG 
clr a fae 1 
move #$4000,y1 ae 2 
move #S$6d4b, yO a2 2 
move #S1,r1 7 1 1 
move #tmp, r0 7 1 1 
move #0,n ee 1 
do x0, Lloop2 2. 3 
mac -yl,y0,a nore 1 
neg a yl,x: (r1)+n aoa 1 
mac yl,y0,a aa 1, 
move a,x: (r0)+n nk 1 temp storage for swap 
tEF yl,a x: (r0) +n, yl anil 1 

loop2 
move yl,x: (r1) ees 1 

7 

; Total: 16 5N+12 
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B.1.14 Array Search 


The following two array search benchmarks are provided: 
¢ Index of the highest signed value 


¢ Index of the highest positive value 


B.1.14.1 Index of the Highest Signed Value 


opt eC 
move #AD, x0 2 2 
clr a x: (r0)+,b Pak 1 
do #N,end_lp3 ee 3 
abs b aeik adi 
cmp b,a Heel al 
tle b,a r0,r1 ed. . 
move x: (r0)+,b Fae 1 
end_1lp3 
lea (r1)- Haade 1 
lea (r1)- paall 1 
; Total: 10 4N+8 (worst case) 


B.1.14.2 Index of the Highest Positive Value 


opt (aio, 
move #AD, x0 e2 2 
cir a x: (r0)+,x0 ook 1 
do #N/2,end_l1p3 fe 2 3 
cmp x0,a x: (r0)+, yO jaro 1 
tle x0,a r0,r1 eo 1 
cmp y0O,a x: (r0)+,x0 Hora 1 
tle y0O,a r0,r1 ; 1 1 
end_lp3 
lea (r1)- hil 1 
lea (r1)- suk 1 
: Total: 10 2N+8 (worst case) 
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B.1.15 Proportional Integrator Differentiator (PID) Algorithm 


The proportional integrator differentiator (PID) algorithm is the most commonly used algorithm in control 
applications. Figure B-13 gives a graphical overview and memory map of this implementation of a 
proportional integrator differentiator. 


y(n)=y(n-1) + kO x(n) + k1 x(n-1) + k2 x(n-2) 


X memory 
Sy, | ko 
k1 
k2 
x(n-1) 
x(n-2) 
Oy | x(n) 


AA0091 


Figure B-13. Proportional Integrator Differentiator Algorithm 


i y(n) = y(n-1) 
opt CC 
move #s+2,r0 
move #2,m0 
move #k,r3 
move 
move 
mac x0,y0,b 
mac y0,x0,b 
movep 
macr y0,x0,b 
move 
movep 
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+ kO x(n) + kl x(n-1) + k2 x(n-2) 

; rO mod 3 

x: (r0)+,b pod 1 
x: (r0)+,y0 x: (r3)+,y0 eel 1 
x: (r0)+,y0 x: (r3)+,y0 ec dh 1 
x: (r3)+,x0 or als 1 

x:input,b aoe 1 
ial 1 

b, xX: (r0) eed. 1 
b, X:output Heal 1 
Total: 8 8 
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A faster version of the PID 


y(n) = y(n-1) + kO x(n) + kl x(n-1) + k2 x(n-2) 


opt EC 

move #s+2,r0 7 

move #2,m0 ; xcO mod 3 
move #k,r3 7 


B accumulator holds y(n-1), Yl holds the KO coefficient 


move x: (r0)+, yO *%:(r3)+,yO ; 1 1 get x(n-2),k2 
mac x0,y0,b x: (r0)+, yO x:(r3)-,xO0 ; 1 1 get x(n-1),k1 
mac y0,x0,b aad 1 
movep x:input,b ae 1 get x(n) 
macr y0,x0,b b,x: (r0)+ erat 1 save x(n) 
movep b, x: output 7 1 1 y(n) inb 

6 6 


B.1.16 Autocorrelation Algorithm 


move #cor,rl 7; 2 2 
move #frame, r2 7; 2 2 
do #1lpc+1,_loop1 aes 3 
move r2,r3 wed. 1 
clr b a ol 1 
move #frame,r0 poe? 2 
lea (r2)+ Cael 1 
move lc,yl Hareb 1 
move #>N-(ptl),a a 2 2 
add yl,a x: (r0)+,yO x:(r3)+,x0 ; 1 1 
rep a “1: 3 
mac y0,x0,b x:(r0)+,yO x:(r3)+,x0 ; 1 1 
move b0,x:(rl1)+ eh 1 
move bil1,x:(rl1)+ Pace 1 
_loop1 7 —___ 2 
; 23 (pt1) 
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Glossary 


See Section A.1, “Notation,” on page A-1 for notations and symbols not listed here. 


A/D analog-to-digital 

ADM application development module 
ADS application development system 
AGU address generation unit 

ALU arithmetic logic unit 

AS accumulator shifter 

BCR bus control register 


BE1-—BE0 breakpoint enable bits 


BK4—BKO breakpoint configuration bits 


BS1—BS0 breakpoint selection bits 


@ vororoia Glossary 


CC 


CCR 


CID 


CGDB 


CMOS 


COFF 


COP 


COPDIS 


CPU 


CS 


D/A 


DAC 


DRM 


G-2 


carry bit 


condition code bit 


condition code register 


chip identification register 


core global data bus 


complementary metal oxide semiconductor 


common object file format 


computer operating properly 


COP timer disable 


central processing unit 


carry bit set 


digital-to-analog 


digital-to-analog converter 


debug request mask bit 
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DSP 


EM1—EM0 


EX 


EXT 


FH 


FIFO 


GE 


GPIO 


GT 


GUI 


HBO 


HI 


HS 
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digital signal processor 


extension bit 


event modifier bits 


external X memory bit 


extension register 


FIFO halt bit 


first-in-last-out 


greater than or equal to 


general-purpose input/output 


greater than 


graphical user interface 


hardware breakpoint occurrence 


high 


high or same 


Glossary G-3 


IPL 


IPR 


LA 


LC 


LE 


LF 


LIFO 


G-4 


hardware stack 


interrupt mask bits 


integrated circuit 


Joint Test Access Group 


input/output 


interrupt priority level 


interrupt priority register 


Kernighan and Ritchie 


limit bit 


loop address register 


loop counter register 


less than or equal to 


loop flag bit 


last-in-first-out 
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LO 


LS 


LSB 


LSP 


LT 


MA, MB 


MAC 


MCU 


MIPS 


MO1 


MR 


MS 


MSB 


MSP 
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low 


least significant; low or same 


least significant bit 


least significant portion 


less than 


operating modes 


multiply-accumulate 


microcontroller unit 


million instructions per second 


modifier register 


mode register 


most significant 


most significant bit 


most significant portion 


Glossary 


G-5 


NL 


OBAR 


OCMDR 


OCNTR 


ODEC 


OISR 


OMAC 


OMAL 


OMR 


OPABDR 


OPABER 


OPABFR 


G-6 


offset register 


negative bit in condition code register 


nested looping bit 


OnCE breakpoint address register 


OnCE command register 


OnCE breakpoint counter 


OnCE decoder 


OnCE input shift register 


OnCE memory address comparator 


OnCE breakpoint address latch 


operating mode register 


OnCE PAB decode register 


OnCE PAB execute register 


OnCE PAB fetch register 
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OPDBR 


OPGDBR 


OS1, OSO 


OSR 


OnCE™ 


P2-P0 


PAB 


PC 


PGDB 


Rn 


SA 
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OnCE PDB register 


Once PGDB register 


OnCE status bits 


OnCE status register 


On-Chip Emulation (unit) 


program counter extension 


program address bus 


program counter 


peripheral global data bus 


power-down mode bit 


phase-locked loop 


rounding bit 


address registers (RO—R3) 


saturation bit 


Glossary 


G-7 


SBO 


SD 


SP 


SPI 


SR 


SSI 


SZ 


TAP 


WWW 


G-8 


software breakpoint occurrence 


stop delay bit 


stack pointer 


serial peripheral interface 


status register 


synchronous serial interface 


size bit 


test access port 


trace occurrence 


unnormalized bit 


overflow bit 


World Wide Web 


external 


X memory address bus one 
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XAB2 


XDB2 
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X memory address bus two 


X memory data bus two 


X/P memory bit 


zero bit 


Glossary 


G-9 
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Index 


A 


A accumulator 3-2, 3-4 
AO, see A accumulator 
Al, see A accumulator 
A2 accumulator extension register 3-2 
ABS A-28 
Absolute Value ABS A-28 
accumulator extension register (A2 or B2) 3-4 
accumulator extension registers 3-2 
accumulator registers 3-2, 3-4 
accumulator shifter 3-2, 3-6 
accumulator sign-extend 8-7 
ADC A-30 
ADD A-32 
Add ADD A-32 
Add Long with Carry ADC A-30 
addition 
fractional 3-18 
multi-precision 3-23 
unsigned 3-22 
Address Generation Unit (AGU) 2-3, 4-1 
address registers (RO-R3) 4-4 
incrementer/decrementer unit 4-5 
Modifier Register (M01) 4-5 
modulo arithmetic unit 4-5 
Offset Register (N) 4-4 
Stack Pointer Register (SP) 4-4 
address register indirect modes 4-7 
addressing modes 4-1, 4-6, A-6 
addressing modes summary 4-23 
AGU, see Address Generation Unit (AGU) 4-1 
ALU, see Data Arithmetic Logic Unit (ALU) 
analog signal processing 1-5 
analog-to-digital 1-6 
AND A-35 
ANDC A-36 
arithmetic 
division 3-21 
multiplication 3-19 
unsigned 3-22, 3-36 
arithmetic instructions 6-6 


ASR A-42 
ASRAC A-44 
ASRR A-46 


B accumulator 3-2, 3-4 

BO, see B accumulator 

B1, see B accumulator 

B2 accumulator extension register 3-2 
barrel shifter 3-2, 3-5 

Bec A-48 

BEC 8-4 

benchmarks B-1 

BES 8-4 

BFCHG A-50 

BFCLR A-52 

BFSET A-54 

BFTSTH A-56 

BFTSTL A-58 

bit-manipulation instructions 6-8 
bit-manipulation unit 2-5 

BLC 8-4 

BLS 8-4 

BMI 8-4 

bootstrap memory 2-8 

boundary scan cell 9-1 

BPL 8-4 

BRICLR operation 8-3 
BRISET operation 8-3 

BRA A-59 

Branch BRA A-59 

Branch Conditionally Bcc A-48 
Branch if Bits Clear BRCLR A-60 
Branch if Bits Set BRSET A-62 
branching techniques, software 8-2 
BRCLR A-60 

BRSET A-62 

bus unit 2-5 

BVC 8-4 

BVS 8-4 


Cc 


Arithmetic Right Shift with Accumulate ASRAC A-44 


Arithmetic Shift Left ASL A-38 
Arithmetic Shift Right ASR A-42 
array indexes 8-26 

ASL A-38 

ASLL A-40 
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C condition bit 5-7, A-10 

CC, see condition code (CC) bit 

CCR, see Condition Code Register (CCR) 
CGDB, see core global data bus (CGDB) 
Clear Accumulator CLR A-64 


Index 


Index-i 


CLR A-64 
CMP A-66 
Compare CMP A-66 
comparing 3-18 
condition code (CC) bit 3-33, 3-34, 3-35, 3-36, 5-12 
condition code computation A-7 
condition code generation 3-33 
Condition Code Register (CCR) 5-6 
Condition Codes 
carry (C) condition 5-7, A-10 
effect of CC bit A-11 
effect of SA bit A-11 
extension in use (E) condition 5-8, A-8 
limit (L) condition 5-8, A-8 
negative (N) condition 5-7, A-9 
overflow (V) condition 5-7, A-10 
size (SZ) condition 5-8, A-7 
unnormalized (U) condition 5-8, A-9 
zero (Z) condition 5-7, A-10 
convergent rounding 3-30 
core global data bus (CGDB) 2-5 


D 


data ALU input registers (XO, Y1, and YO) 3-4 
Data ALU, see Data Arithmetic Logic Unit (ALU) 
Data Arithmetic Logic Unit (ALU) 2-3, 3-1 
accumulator registers (A and B) 3-4 
accumulator shifter 3-6 
barrel shifter 3-5 
Data Limiter 3-6, 3-26 
input registers (XO, Y1, and YO) 3-4 
logic unit 3-5 
MAC Output Limiter 3-6, 3-28 
multiply-accumulator (MAC) 3-5 
Data Limiter 3-2, 3-6, 3-26 
DEBUG A-68 
debug processing state 7-1, 7-22 
DEC(W) A-69 
Decrement Word DEC(W) A-69 
digital signal processing 1-6 
digital-to-analog 1-6 
DIV A-71 
Divide Iteration DIV A-71 
division 3-21, 8-13 
fractional 3-21, 8-13 
integer 3-21, 8-13 
DO A-73 
DO looping 5-15 
DO loops 8-20 
DSP56800 1-1 
DSP56800 core 1-2 
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E 


E condition bit 5-8, A-8 

End Current DO Loop ENDDO A-77 
ENDDO A-77 

Enter Debug Mode DEBUG A-68 
EOR A-79 

EORC A-81 

EX, see external X memory (EX) 
exception processing state 7-1, 7-5 
extension register (A2 or B2) 3-4 
external data memory 2-7 

external X memory (EX) 5-11 


F 


fractional arithmetic 3-14 
fractional division 3-21, 8-13 
fractional multiplication 3-19 


H 


hardware interrupt sources 7-10 
Hardware Stack (HWS) 5-6 


I1 and 10 interrupt mask bits 5-8 
ILLEGAL A-83 

Illegal Instruction Interrupt ILLEGAL A-83 
IMPY(16) A-84 

INC(W) A-86 

Increment Word INC(W) A-86 
incrementer/decrementer unit 4-5 
indexes 8-26 

instruction decoder 5-3 

instruction execution pipelining 6-30 
instruction formats 6-3 

instruction groups 6-6 

instruction latch 5-3 

Instruction Processing 6-30 
instruction set restrictions A-26 
instruction set summary 6-17 
instruction timing A-16 

integer arithmetic 3-14, 3-20 
integer division 3-21, 8-13 

integer multiplication 3-20 

Integer Multiply IMPY(16) A-84 
interrupt arbitration 7-12 

interrupt control unit 5-3 

interrupt latency 7-16 

interrupt mask (I1 and IO) 5-8 
interrupt pipeline 7-14 

interrupt priority level (IPL) 5-3 
Interrupt Priority Register IPR) 7-9 
interrupt priority structure 7-8 
interrupt sources 7-9 
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hardware 7-10 
other 7-11 
software 7-11 
interrupt vector table 7-7 
interrupts 8-30 
IPL, see interrupt priority level (IPL) 
IPR, see Interrupt Priority Register (IPR) 


J 


Jcc A-88 

JEC 8-4 

JES 8-4 

JLC 8-4 

JLS 8-4 

JMI 8-4 

JMP A-90 

Joint Test Action Group (JTAG), see JTAG 
JPL 8-4 

JR1CLR operation 8-3 

JR1SET operation 8-3 

JRCLR operation 8-2 

JRSET operation 8-2 

JSR A-91 

JTAG 9-2 

JTAG port 9-2 

Jump Conditionally Jec A-88 
Jump JMP A-90 

Jump to Subroutine JSR A-91 
jump with register argument 8-33 
jumping techniques, software 8-2 
JVC 8-4 

JVS 8-4 


L 


L condition bit 5-8, A-8 

LEA A-92 

LF, see loop flag (LF) 

Load Effective Address LEA A-92 

local variables 8-28 

logic unit 3-5 

Logical AND A-35 

Logical AND, Immediate ANDC A-36 
Logical Complement NOT A-139 

Logical Complement with Carry NOTC A-140 
Logical Exclusive OR EOR A-79 

Logical Exclusive OR Immediate EORC A-81 
Logical Inclusive OR Immediate ORC A-144 
Logical Inclusive OR OR A-142 

logical instructions 6-7 

logical operations 3-19 


Logical Right Shift with Accumulate LSRAC A-99 


Logical Shift Left LSL A-93 
Logical Shift Right LSR A-97 
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Loop Address Register (LA) 5-5 
Loop Count Register (LC) 5-4 
loop flag (LF) 5-9 

looping control unit 5-4 
looping instructions 6-9 
looping termination 5-15 
loops 5-14, 8-20 

LSL A-93 

LSLL A-95 

LSR A-97 

LSRAC A-99 

LSRR A-101 


MO1, see Modifier Register (M01) 
MAC 3-2, A-103 
MAC Output Limiter 3-6, 3-28 
MAC, see multiply-accumulator (MAC) 
MACR A-105 
MACSU A-108 
MAX operation 8-6 
MB and MA, see operating mode (MB and MA) 
memory access processing 6-31 
MIN operation 8-7 
Mode Register (MR) 5-6 
Modifier Register (M01) 4-5 
modulo arithmetic unit 4-5 
MOVE A-110, A-112, A-114 
Move Absolute Short MOVE(S) A-126 
Move Control Register MOVE(C) A-116 
Move Immediate MOVE(I) A-120 
move instructions 6-9 
Move Peripheral Data MOVE(P) A-124 
Move Program Memory MOVE(M) A-122 
MOVE(C) A-116 
MOVE(I) A-120 
MOVE(M) A-122 
MOVE(P) A-124 
MOVE(S) A-126 
MPY A-128 
MPYR A-130 
MPYSU A-132 
MR, see Mode Register (MR) 
Multi-Bit Arithmetic Left Shift ASLL A-40 
Multi-Bit Arithmetic Right Shift ASRR A-46 
Multi-Bit Logical Left Shift LSLL A-95 
Multi-Bit Logical Right Shift LSRR A-101 
multiplication 3-19 

fractional 3-19 

integer 3-20 

multi-precision 3-23 

unsigned 3-22 
Multiply Accumulate and Round MACR A-105 
Multiply-Accumulate MAC A-103 
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Multiply-Accumulate Signed x Unsigned 

MACSU A-108 
multiply-accumulator (MAC) 3-2, 3-5 
multi-tasking 8-34 


N 


N condition bit 5-7, A-9 

N, see Offset Register (N) 

NEG A-134 

Negate Accumulator NEG A-134 
NEGW 8-4 

nested looping 5-15 

nested looping bit (NL) 5-13 
NL, see nested looping bit (NL) 
No Operation NOP A-136 

NOP A-136 

NORM A-137 

normal processing state 7-1, 7-2 


NOT A-139 
notations A-1 
NOTC A-140 


O 


Offset Register (N) 4-4 
OMR, see Operating Mode Register (OMR) 
OnCE 2-5 
OnCE pipeline 9-7 
OnCE port 
FIFO history buffer 9-7 
overview 9-4 
PAB FIFO 9-7 
OnCE port architecture 9-5 
On-Chip Emulation (OnCE) 2-5 
operating mode (MB and MA) 5-10 
Operating Mode Register (OMR) 5-9 
Condition Code bit (CC) 5-12, A-11 
External X memory bit (EX) 5-11 
Nested Looping bit (NL) 5-13 
Operating Mode bits (MB and MA) 5-10 
Rounding bit (R) 5-12 
Saturation bit (SA) 5-11, A-11 
Stop Delay bit (SD) 5-12 
OR A-142 
ORC A-144 


P 


Parallel Move—Dual Parallel Reads A-114 
parallel moves 6-1 

Parallel Move—Single Parallel Move A-112 
parameters, passing subroutine 8-28 

PC, see Program Counter (PC) 

PDB, see program data bus (PDB) 
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Normalize Accumulator Iteration NORM A-137 


peripheral blocks 1-3 
peripheral data bus 2-5 
PGDB, see peripheral global data bus (PGDB) 
phase-locked loop (PLL) 2-8 
pipeline dependencies 4-33 
pipelining 6-30 
PLL, see phase-locked loop (PLL) 
POP A-146 
Pop from Stack POP A-146 
power consumption 7-19 
processing states 7-1 

debug 7-1, 7-22 

exception 7-1, 7-5 

normal 7-1, 7-2 

reset 7-1 

stop 7-1, 7-19 

wait 7-1, 7-17 
program control instructions 6-11 
Program Controller 2-4 
Program Counter (PC) 5-3 
program data bus (PDB) 2-5 
program memory 2-8 
programming model 2-8, 6-5 
PUSH operation 8-19 


R 


R rounding bit 5-12 
RO-R3 4-4 
register direct addressing modes 4-7 
REP A-147 
repeat looping 5-14 
Repeat Next Instruction REP A-147 
reset processing state 7-1 
entering 7-1 
leaving 7-2 
restrictions, instruction set A-26 
Return from Interrupt RTI A-156 
Return from Subroutine RTS A-158 
RND A-150 
ROL A-152 
ROR A-154 
Rotate Left ROL A-152 
Rotate Right ROR A-154 
Round Accumulator RND A-150 
rounding 3-30 
convergent 3-30 
two’s-complement 3-31 
Rounding bit (R) 5-12 
RTI A-156 
RTS A-158 


S 


saturation 3-26 


@ MOTOROLA 


Saturation bit (SA) 5-11 
SBC A-159 
SD stop delay bit 5-12 
shift operations 8-8 
Signed Multiply and Round MPYR A-130 
Signed Multiply MPY A-128 
Signed Unsigned Multiply MPYSU A-132 
software interrupt sources 
illegal instruction (III) 7-11 
software interrupt (SWI) 7-11 
Software Interrupt SWI A-165 
software stack 5-13 
SP, see Stack Pointer Register (SP) 
SR, see Status Register (SR) 
Stack Pointer Register (SP) 4-4 
Start Hardware Do Loop DO A-73 
Status Register (SR) 5-6 
carry bit (C) 5-7 
extension bit (E) 5-8 
interrupt mask bits (I1 and IO) 5-8 
limit bit (L) 5-8 
loop flag bit (LF) 5-9 
negative bit (N) 5-7 
overflow bit (V) 5-7 
reserved bits 5-9 
size bit (SZ) 5-8 
unnormalized bit (U) 5-8 
zero bit (Z) 5-7 
STOP A-161 
stop delay (SD) 5-12 
STOP instruction 7-19 
Stop Instruction Processing STOP A-161 
stop processing state 7-1, 7-19 
SUB A-162 
Subtract Long with Carry SBC A-159 
Subtract SUB A-162 
subtraction 
fractional 3-18 
multi-precision 3-23 
SWI A-165 
SZ condition bit 5-8, A-7 


T 


TAP, see test access port (TAP) 

Tcc A-166 

test access port (TAP) 9-2 

Test Accumulator TST A-170 

Test Bitfield and Change BFCHG A-50 
Test Bitfield and Clear BFCLR A-52 
Test Bitfield and Set BFSET A-54 

Test Bitfield High BFTSTH A-56 

Test Bitfield Low BFTSTL A-58 

Test Register or Memory TSTW A-172 
TFR A-168 


@ MOTOROLA 


time-critical loops 8-29 

Transfer Conditionally Tcc A-166 
Transfer Data ALU Register TFR A-168 
TST A-170 

TSTW A-172 

two’s-complement rounding 3-31 


U 


U condition bit 5-8, A-9 
unsigned arithmetic 3-22 
addition 3-22 
condition code computation 3-22 
multiplication 3-22 
subtraction 3-22 
unsigned load of an accumulator 8-7 


V 
V condition bit 5-7, A-10 
WwW 


WAIT A-174 
Wait for interrupt WAIT A-174 
wait processing state 7-1, 7-17 


X 


XO input register 3-2, 3-4 

XABI1 2-5 

XAB2 2-5 

XCHG register exchange operation 8-6 
XDB2 2-5 


Y 


YO input register 3-2, 3-4 
Y1 input register 3-2, 3-4 


Z 
Z condition bit 5-7, A-10 
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